This is a migration for customer sales, provisioning and usage data collected across a number of products. The initial task is to migrate ETL jobs from Oozie to Apache Airflow and it should provide a good understanding of the data flows within the platform.
Practical experience with Airflow is highly desired. After that the EPAM data team is supposed to work on Core KPI's platform that is hosted in AWS (S3, EMR, Spark/PySpark etc). There is some complex business logic in ETL jobs and the goal is to migrate and optimize data flows while focusing on improving data quality. This is potentially a long-term engagement but it depends on the delivery results for the initial scope.
You will collaborate with Data Scientists, Product Managers, Executives and other key stakeholders around the world. In this role, you will leverage your vast knowledge, skills, and experiences to understand data requirements and build the systems and platform that help unleash insights. You will have a direct impact on the insights that are used to create delightfully smart, personalized and revolutionary customer experiences.
Apply your broad knowledge of technology options, technology platforms, design techniques and approaches across the data warehouse lifecycle phases to design an integrated quality solution that address requirements
Ensure completeness and compatibility of the technical infrastructure required to support system performance, availability and architecture requirements
Design and plan for the integration for all data warehouse technical components
Provide input and recommendations on technical issues to the team
Responsible for data design, data extracts and transforms
Develop implementation and operation support plans
Lead architecture design and implementation of next generation BI solution
Build robust and scalable data integration (ETL) pipelines using AWS Services, EMR, Python, PiG and Spark
Mentor and develop other Junior Data Engineers
Build and deliver high quality data architecture to support Business Analysts, Data Scientists and customer reporting needs
Interface with other technology teams to extract, transform, and load data from a wide variety of data sources
Continually improve ongoing reporting and analysis processes, automating or simplifying self-service support for customers
Bachelor's degree in Computer Science required; Master’s degree preferred
7+ years of relevant experience in one of the following areas: Big Data Engineering, Datawarehouse, Business Intelligence or Business Analytics
7+ years of hands-on experience in writing complex, highly-optimized SQL queries across large data sets
Demonstrated strength in data modeling, ETL development, and Data Warehousing
Experience with AWS services including S3, EMR, Kinesis and RDS
Experience with big data stack of technologies, including Hadoop, HDFS, Hive, Spark, Pig, Presto
Experience with delivering end-to-end projects independently
Experience with using AirFlow, creating and maintaining DAGs, Operators, and Hooks
Knowledge of distributed systems as it pertains to data storage and computing
Exceptional Problem solving and analytical skills
Knowledge of software engineering best practices across the development lifecycle; including, Agile methodologies, coding standards, code reviews, source management, build processes, testing, and operations
About EPAM Systems
EPAM Systems is a leading global provider of digital platform engineering and software development services, with more than 36,700+ employees worldwide.
DOU company page:
Job posted on
19 November 2020