Project Name

Sparklens visualisation extension - 2

Mentor Details

Name: 

Mayur Bhosale

Organisation:

Qubole

Designation:

Spark Developer

Project Description

Sparklens is an open-source Spark Application profiler. Debugging and tuning the Spark application is a tedious task and requires a subject matter expert and even after that validating the suggestion can turn out to be an expensive affair. Sparklens helps in narrowing down the bottlenecks of the application and also helps in setting the optimal number of executors using a built-in simulator. Currently, sparklens writes the output to the command line and it looks something like this:https://github.com/qubole/sparklens#what-does-it-report and is not at all intuitive. We need a local static service that can take this output (internally it's a JSON file) and create a static web UI version of this.

Programming Languages

Any of the Js frameworks, Scala

Project Pre-requisites

Basic understanding of git. If the students ends up working on the core/sparklens related tasks basic knowledge of Java/Scala is required)

Project Duration (in Months)

1.5 month

Number of openings

2

Project Difficulty

Moderate

Additional Information

Spark application tuning is a difficult problem and there are many companies/projects which are trying to tackle this. Here are few of the references: https://www.pepperdata.com/, https://github.com/linkedin/dr-elephant
Looking into the performance side is great way to understand the internals of the distributed system.

Proposal requirements

Try to research a bit about open source distributed computing frameworks - Spark, Hive, Presto and try to write a 2-3 line summary of each explaining there pros and cons.

Have questions or feedback? Interested in working with us?  Email us at connectinternlink@gmail.com