You are going to build innovative data pipelines for processing and analyzing client’s large user datasets (250 billion + events per month).
You are going to build innovative data pipelines for processing and analyzing client’s large user datasets (250 billion + events per month). A unique challenge with the role is being comfortable in developing varied technologies like custom transformation/integration apps in Python and Java, and pipelines in Spark, Kafka, Kinesis, transforming and analyzing in SQL.
- Develop ETL (Extract, Transform and Load) Data pipelines in Spark, Kinesis, Kafka, custom Python apps to transfer massive amounts of data (over 20TB/ month) most efficiently between systems
- Engineer complex and efficient and distributed data transformation solutions using Python, Java, Scala, SQL
- Productionalize Machine Learning models efficiently utilizing resources in clustered environment
- Research, plan, design, develop, document, test, implement and support proprietary software applications
- Analytical data validation for accuracy and completeness of reported business metrics
- Open to taking on, learn and implement engineering projects outside of core competency
- Understand the business problem and engineer/architect/build an efficient, cost-effective and scalable technology infrastructure solution
- Monitor system performance after implementation and iteratively devise solutions to improve performance and user experience
- Research and innovate new data product ideas to grow client’s revenue opportunities and contribute to company’s intellectual property
- 3+ years of experience of developing in Python to transform large datasets on distributed and cluster infrastructure
- 5+ years of experience in engineering ETL data pipelines for Big Data Systems
- Proficient in SQL. Have some experience performing data transformations and data analysis using SQL
- Comfortable in juggling multiple technologies and high priority tasks
Nice to have:
- BS or higher degree in computer science, engineering or other related field
- 5+ years of Object Oriented Programming experience in any of languages such as Java, Scala, C++
- Prior experience of designing and building ETL infrastructure involving streaming systems such as Kafka, Spark, AWS Kinesis
- Experience of implementing clustered/ distributed/ multi-threaded infrastructure to support Machine Learning processing on Spark or Sagemaker
- Experience with Distributed columnar databases like Veritca, Greenplum, Redshift, or Snowflake
Success in this role:
- Demonstrate a passion for Data
- Eagerness in research and learning new technologies to develop creative and efficient ways to solve business problems
- Take full responsibility for the initiative
- Stay focused on the successful implementation of the task at hand before moving on to the next engineering challenge
- Going above and beyond: While engineering for current tasks, think of the big picture, adjustment code bases, processes. Try ways to make systems more robust, fault tolerant, monitor for failures, and program for automated recovery
- Opportunity to work on bleeding-edge projects
- Work with a highly motivated and dedicated team
- Competitive salary
- Flexible schedule
- Medical insurance
- Benefits program
- Corporate social events
Grid Dynamics is the engineering services company known for transformative, mission-critical cloud solutions for retail, finance and technology sectors. We architected some of the busiest e-commerce services on the Internet and have never had an outage during the peak season. Founded in 2006 and headquartered in San Ramon, California with offices throughout the US and Eastern Europe, we focus on big data analytics, scalable omnichannel services, DevOps, and cloud enablement.
About Grid Dynamics
Grid Dynamics is a leading provider of technology consulting, agile co-creation and scalable engineering and data science services for Fortune 500 corporations undergoing digital transformation. We work in close collaboration with our clients on digital transformation initiatives that span strategy consulting, early prototypes and enterprise-scale delivery of new digital platforms. We help organizations become more agile and create innovative digital products and experiences using deep expertise in emerging technology, top global engineering talent, lean software development practices, and a high-performance product culture. Headquartered in Silicon Valley with over 1,200 technologists located in engineering delivery centers throughout the US, Central and Eastern Europe, Grid Dynamics is known for architecting and delivering some of the largest digital transformation programs in the retail, technology and financial sectors to help its clients win market share, shorten time to market and reduce costs of digital operations on a massive scale.
DOU company page:
This job is no longer active.