Project Name

Spark Streaming S3-SQS Connector

Mentor Details


Abhishek Dixit




Member Technical Staff

Project Description

Spark structured streaming is one of the most popular frameworks used to build streaming applications today. Among various supported data sources, it also supports reading files from cloud object stores like Amazon S3. However, continuously reading files from S3 buckets can prove to be costly due to expensive List API requests and added latency. Hence, we built a Spark Streaming S3-SQS connector which uses Amazon SQS to identify new files and avoids expensive List API requests on S3. This project covers key feature enhancements in the S3-SQS connector.

Key Objectives:
* Key feature improvements like support for SNS messages (bootstrap)
* Feature improvements to Make S3-SQS Connector Hadoop-3 compatible.
* Building a Test Framework for S3-SQS Connector

Necessary Requirements:
Prior Development Experience with any Objected Oriented Program Language (Java, Scala, etc)

Preferred Requirements:
* Knowledge / Development Experience with Scala
* Experience of writing Unit Test Frameworks in Java / Scala
* Basic knowledge of Spark / Spark Structured Streaming will be helpful.

The project will give hands-on experience of writing production-level code and understanding of key Amazon Cloud Offerings like S3 and SQS and insights into the working of Big Data and Streaming Systems. The project will also provide an opportunity to contribute to Apache Bahir, the principal repository for external connectors for Apache Spark.

Programming Languages


Project Pre-requisites

Prior Development Experience with any Objected Oriented Program Language (Java, Scala, etc)

Project Duration (in Months)

1 month

Number of openings


Project Difficulty


Additional Information

Proposal requirements

Please share Github links of your existing projects.

Have questions or feedback? Interested in working with us?  Email us at