Data Engineer
SouthRivers Data
Responds Quickly
$$$$
About the Role
We are looking for a Data Engineer to design, build, and operate large-scale data pipelines and lakehouse platforms in an enterprise environment. You will work hands-on with AWS data services, Apache Spark, and Python to deliver reliable, performant, and well-modeled data products that power analytics and downstream applications.
Key Responsibilities
- Design, implement, and maintain robust, reliable, and scalable data pipelines for batch and large-scale processing workloads.
- Build and evolve a Data Lakehouse on AWS using cloud object storage (S3), open table formats, and distributed processing frameworks.
- Develop ETL/ELT workflows in Python and Apache Spark, ensuring performance, cost-efficiency, and maintainability.
- Model data for analytical and reporting use cases, applying dimensional and analytical modeling best practices.
- Write advanced SQL for transformation, optimization, and ad-hoc analysis across large datasets.
- Operate and optimize AWS data services such as EMR, Glue, Athena, and S3-based data lakes.
- Troubleshoot pipeline and platform issues end-to-end, applying system-level thinking to identify root causes and durable fixes.
- Collaborate with analysts, data scientists, and platform engineers to translate business requirements into technical solutions.
- Contribute to data quality, observability, governance, and CI/CD practices for data workloads.
Required Qualifications
- Strong, proven hands-on experience in data engineering within enterprise environments.
- Top-notch advanced SQL skills and solid understanding of analytical and dimensional data modeling.
- Strong hands-on experience with Data Lakehouse or modern data platform concepts: cloud object storage, open table formats (e.g., Delta, Iceberg, Hudi), and distributed processing.
- Strong hands-on experience with AWS data services: EMR, Glue, Athena, and S3-based data lakes.
- Strong hands-on experience with Apache Spark for large-scale data processing.
- Strong Python skills for ETL development, data processing, and automation.
- Demonstrated experience designing, implementing, and maintaining robust and reliable data pipelines in production.
- Very strong analytical, problem-solving, and system-level thinking skills.
Nice to Have
- Experience with workflow orchestration tools (e.g., Airflow, Step Functions).
- Familiarity with infrastructure-as-code (Terraform, CloudFormation) and CI/CD for data.
- Exposure to data governance, cataloging (e.g., AWS Glue Data Catalog, Lake Formation), and data quality frameworks.
- Streaming experience (Kafka, Kinesis, Spark Structured Streaming).
Required skills experience
| Apache Spark | 3 years |
| SQL | 3 years |
| AWS | 3 years |
Required languages
| English | B2 - Upper Intermediate |
Published 29 April
7 views
ยท
1 application
Last responded 4 hours ago
๐
$2500-4000
Average salary range of similar jobs in
analytics โ
Loading...