Python+Spark Developer Offline

EPAM Systems

. Practical experience with Airflow is highly desired. After that the EPAM data team is supposed to work on Core KPI's platform that is hosted in AWS (S3, EMR, Spark/PySpark etc). There is some complex business logic in ETL jobs and the goal is to migrate and optimize data flows while focusing on improving data quality. This is potentially a long-term engagement but it depends on the delivery results for the initial scope.

You will collaborate with Data Scientists, Product Managers, Executives and other key stakeholders around the world. In this role, you will leverage your vast knowledge, skills, and experiences to understand data requirements and build the systems and platform that help unleash insights. You will have a direct impact on the insights that are used to create delightfully smart, personalized and revolutionary customer experiences.

Responsibilities

Apply your broad knowledge of technology options, technology platforms, design techniques and approaches across the data warehouse lifecycle phases to design an integrated quality solution that address requirements

Ensure completeness and compatibility of the technical infrastructure required to support system performance, availability and architecture requirements

Design and plan for the integration for all data warehouse technical components

Provide input and recommendations on technical issues to the team

Responsible for data design, data extracts and transforms

Develop implementation and operation support plans

Lead architecture design and implementation of next generation BI solution

Build robust and scalable data integration (ETL) pipelines using AWS Services, EMR, Python, PiG and Spark

Mentor and develop other Junior Data Engineers

Build and deliver high quality data architecture to support Business Analysts, Data Scientists and customer reporting needs

Interface with other technology teams to extract, transform, and load data from a wide variety of data sources

Continually improve ongoing reporting and analysis processes, automating or simplifying self-service support for customers

Qualifications

Bachelor's degree in Computer Science required; Master’s degree preferred

7+ years of relevant experience in one of the following areas: Big Data Engineering, Datawarehouse, Business Intelligence or Business Analytics

7+ years of hands-on experience in writing complex, highly-optimized SQL queries across large data sets

Demonstrated strength in data modeling, ETL development, and Data Warehousing

Experience with AWS services including S3, EMR, Kinesis and RDS

Experience with big data stack of technologies, including Hadoop, HDFS, Hive, Spark, Pig, Presto

Experience with delivering end-to-end projects independently

Experience with using AirFlow, creating and maintaining DAGs, Operators, and Hooks

Knowledge of distributed systems as it pertains to data storage and computing

Exceptional Problem solving and analytical skills

Knowledge of software engineering best practices across the development lifecycle; including, Agile methodologies, coding standards, code reviews, source management, build processes, testing, and operations

The job ad is no longer active
Job unpublished on 20 December 2020

Look at the current jobs (Other) Kyiv→

from Upper-Intermediate

Considering with Pre-Intermediate
Only from 3 years of experience
Worldwide
Countries where we consider candidates

(Other)
Python, Apache Spark, Spark, AWS

Domain: Other

Apply for the job

📊 Average salary range of similar jobs in analytics →