Data Engineer with Databricks
We are seeking an experienced Data Engineer with deep expertise in Databricks to design, build, and maintain scalable data pipelines and analytics solutions. This role requires 5 years of hands-on experience in data engineering with a strong focus on the Databricks platform.
Key Responsibilities:
- Data Pipeline Development & Management
- Design and implement robust, scalable ETL/ELT pipelines using Databricks and Apache Spark
- Process large volumes of structured and unstructured data
- Develop and maintain data workflows using Databricks workflows, Apache Airflow, or similar orchestration tools
- Optimize data processing jobs for performance, cost efficiency, and reliability
- Implement incremental data processing patterns and change data capture (CDC) mechanisms
- Databricks Platform Engineering
- Build and maintain Delta Lake tables and implement medallion architecture (bronze, silver, gold layers)
- Develop streaming data pipelines using Structured Streaming and Delta Live Tables
- Manage and optimize Databricks clusters for various workloads
- Implement Unity Catalog for data governance, security, and metadata management
- Configure and maintain Databricks workspace environments across development, staging, and production
- Data Architecture & Modeling
- Design and implement data models optimized for analytical workloads
- Create and maintain data warehouses and data lakes on cloud platforms (Azure, AWS, or GCP)
- Implement data partitioning, indexing, and caching strategies for optimal query performance
- Collaborate with data architects to establish best practices for data storage and retrieval patterns
- Performance Optimization & Monitoring
- Monitor and troubleshoot data pipeline performance issues
- Optimize Spark jobs through proper partitioning, caching, and broadcast strategies
- Implement data quality checks and automated testing frameworks
- Manage cost optimization through efficient resource utilization and cluster management
- Establish monitoring and alerting systems for data pipeline health and performance
- Collaboration & Best Practices
- Work closely with data scientists, analysts, and business stakeholders to understand data requirements
- Implement version control using Git and follow CI/CD best practices for code deployment
- Document data pipelines, data flows, and technical specifications
- Mentor junior engineers on Databricks and data engineering best practices
- Participate in code reviews and contribute to establishing team standards
Required Qualifications:
- Experience & Skills
- 5+ years of experience in data engineering with hands-on Databricks experience
- Strong proficiency in Python and/or Scala for Spark application development
- Expert-level knowledge of Apache Spark, including Spark SQL, DataFrames, and RDDs
- Deep understanding of Delta Lake and Lakehouse architecture concepts
- Experience with SQL and database optimization techniques
- Solid understanding of distributed computing concepts and data processing frameworks
- Proficiency with cloud platforms (Azure, AWS, or GCP) and their data services
- Experience with data orchestration tools (Databricks Workflows, Apache Airflow, Azure Data Factory)
- Knowledge of data modeling concepts for both OLTP and OLAP systems
- Familiarity with data governance principles and tools like Unity Catalog
- Understanding of streaming data processing and real-time analytics
- Experience with version control systems (Git) and CI/CD pipelines
Preferred Qualifications:
- Databricks Certified Data Engineer certification (Associate or Professional)
- Experience with machine learning pipelines and MLOps on Databricks
- Knowledge of data visualization tools (Power BI, Tableau, Looker)
- Experience with infrastructure as code (Terraform, CloudFormation)
- Familiarity with containerization technologies (Docker, Kubernetes)
Required skills experience
Databricks | 3 years |
Required languages
English | B2 - Upper Intermediate |
Published 8 October
33 views
ยท
14 applications
๐
$4000-6200
Average salary range of similar jobs in
analytics โ
Loading...