Senior Data Engineer with Databricks and PySpark, Exposure Platform
Project overview
The Exposure Platform is a long running enterprise data transformation initiative focused on replacing legacy SQL based logic with modern, scalable data processing solutions. The platform handles large data volumes and supports multiple business domains through a shared and business critical codebase. The work emphasises maintainability, performance, and strong software engineering practices across the organisation.
Team
You will work in a small cross functional data engineering team consisting of senior data engineers and software engineers. The team collaborates closely through code reviews, shared design discussions, and agreed engineering standards while contributing to a common codebase. The team is currently expanding and values structured collaboration and technical ownership.
Position overview
We are looking for a Senior Data Engineer to support a large scale transformation from SQL Server based systems to a Databricks and Delta Lake platform. You will focus on enterprise grade data engineering and software development, building maintainable and scalable data processing solutions used by multiple teams. This role is not focused on analytics or reporting, but on core data transformation and platform development.
Technology stack
Databricks, Delta Lake, Python, PySpark, SQL Server, Azure Data Factory, Azure DevOps, Git, CI CD pipelines
Responsibilities
- Read, understand, and reason about complex SQL stored procedures and embedded business logic
- Redesign and implement existing SQL logic as clean and maintainable Python and PySpark code in Databricks
- Develop production grade transformation code using reusable packages, modules, and components
- Apply software engineering best practices including clean code, object oriented design, modularisation, and refactoring
- Design and evolve data models across Bronze, Silver, and Gold layers
- Work with very large data volumes and highly parallel event driven data transformations
- Participate actively in code reviews and technical design discussions
- Contribute to the stability, scalability, and long term maintainability of the shared data engineering codebase
Requirements
- Strong experience with Python and PySpark in production data engineering environments
- Hands on experience working with Databricks and Delta Lake
- Strong SQL skills with the ability to read, analyse, and translate complex stored procedures
- Experience working in large shared codebases beyond notebook based development
- Solid understanding of object oriented programming and software engineering principles
- Experience applying clean code practices, refactoring, and maintainable design
- Strong background in data modelling, including transactional and analytical models
- Experience working with layered data architectures such as Bronze, Silver, and Gold
- Ability to analyse existing code line by line and explain technical and business logic clearly
Nice to have
- Experience with Power BI for data consumption or validation
- Exposure to enterprise scale data platforms in complex organisational environments
Required skills experience
| Databricks | 4 years |
| Delta Lake | 4 years |
| Python | 4 years |
| PySpark | 4 years |
| Azure Data Factory | 4 years |
Required languages
| English | B2 - Upper Intermediate |
| Ukrainian | Native |