Data Engineer

SouthRivers Data Responds Quickly
$$$$

About the Role

We are looking for a  Data Engineer to design, build, and operate large-scale data pipelines and lakehouse platforms in an enterprise environment. You will work hands-on with AWS data services, Apache Spark, and Python to deliver reliable, performant, and well-modeled data products that power analytics and downstream applications.

Key Responsibilities

  • Design, implement, and maintain robust, reliable, and scalable data pipelines for batch and large-scale processing workloads.
  • Build and evolve a Data Lakehouse on AWS using cloud object storage (S3), open table formats, and distributed processing frameworks.
  • Develop ETL/ELT workflows in Python and Apache Spark, ensuring performance, cost-efficiency, and maintainability.
  • Model data for analytical and reporting use cases, applying dimensional and analytical modeling best practices.
  • Write advanced SQL for transformation, optimization, and ad-hoc analysis across large datasets.
  • Operate and optimize AWS data services such as EMR, Glue, Athena, and S3-based data lakes.
  • Troubleshoot pipeline and platform issues end-to-end, applying system-level thinking to identify root causes and durable fixes.
  • Collaborate with analysts, data scientists, and platform engineers to translate business requirements into technical solutions.
  • Contribute to data quality, observability, governance, and CI/CD practices for data workloads.

Required Qualifications

  • Strong, proven hands-on experience in data engineering within enterprise environments.
  • Top-notch advanced SQL skills and solid understanding of analytical and dimensional data modeling.
  • Strong hands-on experience with Data Lakehouse or modern data platform concepts: cloud object storage, open table formats (e.g., Delta, Iceberg, Hudi), and distributed processing.
  • Strong hands-on experience with AWS data services: EMR, Glue, Athena, and S3-based data lakes.
  • Strong hands-on experience with Apache Spark for large-scale data processing.
  • Strong Python skills for ETL development, data processing, and automation.
  • Demonstrated experience designing, implementing, and maintaining robust and reliable data pipelines in production.
  • Very strong analytical, problem-solving, and system-level thinking skills.

Nice to Have

  • Experience with workflow orchestration tools (e.g., Airflow, Step Functions).
  • Familiarity with infrastructure-as-code (Terraform, CloudFormation) and CI/CD for data.
  • Exposure to data governance, cataloging (e.g., AWS Glue Data Catalog, Lake Formation), and data quality frameworks.
  • Streaming experience (Kafka, Kinesis, Spark Structured Streaming).

Required skills experience

Apache Spark 3 years
SQL 3 years
AWS 3 years

Required languages

English B2 - Upper Intermediate
Published 29 April
7 views
ยท
1 application
Last responded 4 hours ago
To apply for this and other jobs on Djinni login or signup.
Loading...