Sparklens visualisation extension
Distributed Systems Engineer
Sparklens is an open-source Spark Application profiler. Debugging and tuning the Spark application is a tedious task and requires a subject matter expert and even after that validating the suggestion can turn out to be an expensive affair. Sparklens helps in narrowing down the bottlenecks of the application and also helps in setting the optimal number of executors using a built-in simulator. Currently, sparklens writes the output to the command line and it looks something like this: and is not at all intuitive. We need a local static service which can take this output (internally it's a JSON file) and create a static web UI version of this.
Expected outcome -
Locally running web UI (This is a reference UI: wherein the user is able to navigate through the pages. Apart from the link mentioned above there are additional components which needs to be added.
If the time permits, and the student is interested, we ca take up the enhancements in sparklens core as well and submit them to the open source project.
Any of the Js frameworks, Scala
Basic understanding of git. If the students ends up working on the core/sparklens related tasks basic knowledge of Java/Scala is required)
Project Duration (in Months)
Number of openings
Spark application tuning is a difficult problem and there are many companies/projects which are trying to tackle this. Here are few of the references:
Looking into the performance side is great way to understand the internals of the distributed system.
Try to research a bit about open source distributed computing frameworks - Spark, Hive, Presto and try to write a 2-3 line summary of each explaining there pros and cons.