As a Data Product Engineer at 1010data, you will be responsible for designing, maintaining, and optimizing an ELT process that incorporates several industry-standard data orchestration tools and in-house proprietary querying, scheduling, and automation tools.
For a project with our client 1010data we are looking for a Data Product Engineer
1010data travels at the speed of thought to make Big Data discovery easy; they power sub-second responses to analyses run on billions of rows of data. 1010data is defining the way the world interacts with data.
An essential tool to more than 700 of the world’s top retail, manufacturing, telecom, government, and financial services enterprises including Shell, Nespresso, Dollar General, P&G, and RiteAid; the 1010data platform is a highly differentiated product that is becoming the industry standard for Big Data Discovery and Data Sharing. With more than 30 trillion rows of data in private cloud, 1010data is designed to scale to the largest volumes of granular data, the most disparate and varied data sets, and the most complex advanced analytics. All while delivering lightning-quick system performance.
As a Data Product Engineer at 1010data, you will be responsible for designing, maintaining, and optimizing an ELT process that incorporates several industry-standard data orchestration tools and in-house proprietary querying, scheduling, and automation tools. This pipeline supports a client-facing application, so reliability, efficiency, and accuracy are all critical. You will own the pipeline end-to-end, with an understanding of the technical and business requirements that inform every design decision. Accordingly, in addition to direct design and development, you will provide a product-level perspective for other engineers, analysts, and account-management teams to ensure the codebase continues to develop in an efficient, sustainable manner. As we incorporate more cloud technologies into our processes, you will be at the forefront of exploring and defining best practices and helping us transition our product to be more scalable.
Different parts of the existing ETL pipeline run on Apache Spark and Qubole, on our in-house automation technology, and on our proprietary platform, all coordinated using Apache Airflow. Accordingly, you will master and move between multiple languages—Python, Scala, and our proprietary query language. Our query engine, query language, database, and data storage layer were all developed and fine-tuned in-house over the lifetime of the company. You will be formally trained both in the overall 1010 architecture and in the 1010 query language.
An ideal candidate will have at least two years of experience working with Apache Spark, Apache Airflow, or both, as well as experience building and maintaining an ETL pipeline that supports a client-facing application in a production environment. Because the pipeline is partially comprised of queries against our proprietary database, professional experience with writing high-performance queries in SQL or another query language is also highly preferred.
What you will take on:
End-to-end ownership of every step in a complex data pipeline, in multiple languages including our proprietary query language
Understanding the business logic of every step in the process, allowing you and other team members to make informed decisions about design changes and improvements
Designing and writing Apache Spark scripts to preprocess terabytes of data for ingestion into our pipeline
Designing, writing and maintaining Apache Airflow jobs that coordinate between multiple technologies
Improving and expanding the product’s features by writing high-efficiency transformations in our proprietary query language and integrating those queries into the pipeline
Ensuring quality, reliability and uptime for critical automated processes, including by helping the data quality team diagnose and resolve issues in the pipeline and in the data
Helping to migrate our products and processes into the cloud and reduce our in-house data center footprint
What you already have:
At least 2 years of professional experience programming in Python and/or Scala
At least 1 year of experience maintaining a client-facing pipeline in a production environment
Exposure to basic database concepts
Good understanding of Data Engineering, NoSQL databases and database design, distributed systems and/or information retrieval
Knowledge of Apache Airflow
Knowledge of Apache Spark
Experience writing high-performance queries in SQL or another query language
Experience with Saas products
Ability to plan and collect requirements for projects, and interact with the analyst and data science teams
STEM Bachelor’s required, a graduate degree is a big plus
As an Open Source technology scale-up, we build innovative and tailor-made solutions for companies. Our Open Source experts can also be hired as a consultant within your company.
The company culture is based on the Inukshuk.
It is a symbol of leadership encouraging the importance of friendship and reminding us of our dependence upon one another. The Inukshuk functions as a compass and guides our clients in the wilderness of the ICT-landscape. Inuits provides a safe and easier path for those who join our Open Source community. If you want to experience our trust, nourishment, and reassurance, come and visit our Igloo. Our Inukshuk will guide you!
Job posted on
10 May 2021