Project Name
Spark Streaming S3-SQS Connector
Mentor Details
Name:
Abhishek Dixit
Organisation:
Qubole
Designation:
Member Technical Staff
Project Description
Spark structured streaming is one of the most popular frameworks used to build streaming applications today. Among various supported data sources, it also supports reading files from cloud object stores like Amazon S3. However, continuously reading files from S3 buckets can prove to be costly due to expensive List API requests and added latency. Hence, we built a Spark Streaming S3-SQS connector which uses Amazon SQS to identify new files and avoids expensive List API requests on S3. This project covers key feature enhancements in the S3-SQS connector.
Key Objectives:
* Key feature improvements like support for SNS messages (bootstrap)
* Feature improvements to Make S3-SQS Connector Hadoop-3 compatible.
* Building a Test Framework for S3-SQS Connector
Necessary Requirements:
Prior Development Experience with any Objected Oriented Program Language (Java, Scala, etc)
Preferred Requirements:
* Knowledge / Development Experience with Scala
* Experience of writing Unit Test Frameworks in Java / Scala
* Basic knowledge of Spark / Spark Structured Streaming will be helpful.
The project will give hands-on experience of writing production-level code and understanding of key Amazon Cloud Offerings like S3 and SQS and insights into the working of Big Data and Streaming Systems. The project will also provide an opportunity to contribute to Apache Bahir, the principal repository for external connectors for Apache Spark.
Programming Languages
Scala
Project Pre-requisites
Prior Development Experience with any Objected Oriented Program Language (Java, Scala, etc)
Project Duration (in Months)
1 month
Number of openings
1
Project Difficulty
Moderate
Additional Information
Github:
Proposal requirements
Please share Github links of your existing projects.