Junior/Middle Data engineer (IRC274101)
Job Description
- Strong experience in data pipeline development and ETL/ELT processes.
- Proficiency with Apache Airflow for workflow orchestration.
- Hands-on experience with object storage solutions, preferably MinIO.
- Expertise in SQL and database management, specifically PostgreSQL.
- Experience with graph databases like Neo4j.
- Familiarity with vector databases such as Qdrant.
- Ability to work with large, diverse datasets and ensure data integrity.
- Solid expertise in SQL and relational DBs
- Experience in database design and optimization
- Experience with NoSQL DBs (MongoDB, Cosmos, etc.) for handling unstructured and semi-structured data
Contributing to release management following the best CI/CD practices
Job Responsibilities
- Design, develop, and maintain robust and scalable data pipelines for ingesting, transforming, and loading diverse datasets.
- Implement ETL/ELT processes to cleanse, validate, and enrich raw data into query-optimized formats.
- Orchestrate data workflows using Apache Airflow, including scheduling jobs and managing dependencies.
- Manage and optimize data storage solutions in MinIO (object storage), PostgreSQL (relational data).
- Ensure data integrity, quality, and compliance throughout the data lifecycle.
- Collaborate with cross-functional teams to understand data requirements and deliver data solutions that enable advanced analytics and AI/ML initiatives.
- Troubleshoot and resolve data-related issues, ensuring high availability and performance of data systems.
Department/Project Description
Our client is focused on developing a robust and versatile data ingestion pipeline and associated schema designed to efficiently and accurately collect, process, analyze, and manage diverse data types from various sources in real-time or near real-time.This pipeline will automate and enhance data workflows, ensure data quality, and support advanced analytical capabilities including NLP, Face Recognition, and OCR.
As a Middle Data Engineer on the project, you will play a crucial role in managing deployment, infrastructure, automation, and monitoring. You will be instrumental in setting up and maintaining CI/CD pipelines, managing cloud resources, ensuring system stability and performance, and implementing robust logging and alerting mechanisms for the client platform.If you seek a challenge and want to impact the way the world distributes products from manufacturers to store shelves, we invite you to join our team.