Sparklens visualisation extension - 2
Sparklens is an open-source Spark Application profiler. Debugging and tuning the Spark application is a tedious task and requires a subject matter expert and even after that validating the suggestion can turn out to be an expensive affair. Sparklens helps in narrowing down the bottlenecks of the application and also helps in setting the optimal number of executors using a built-in simulator. Currently, sparklens writes the output to the command line and it looks something like this: and is not at all intuitive. We need a local static service that can take this output (internally it's a JSON file) and create a static web UI version of this.
Any of the Js frameworks, Scala
Basic understanding of git. If the students ends up working on the core/sparklens related tasks basic knowledge of Java/Scala is required)
Project Duration (in Months)
Number of openings
Spark application tuning is a difficult problem and there are many companies/projects which are trying to tackle this. Here are few of the references:
Looking into the performance side is great way to understand the internals of the distributed system.
Try to research a bit about open source distributed computing frameworks - Spark, Hive, Presto and try to write a 2-3 line summary of each explaining there pros and cons.