Senior Data Engineer
We are looking for a Data Engineer to help build a next-generation data and AI platform. The platform combines real-time data ingestion, multi-database storage (relational, analytical, graph, vector), and AI-driven analytics. You will design scalable pipelines, ensure data quality, manage governance, and optimize data flows for analytics and ML/AI workloads.
Key Responsibilities:
1. Data Ingestion & Integration
- Build and maintain scalable data ingestion pipelines (batch and streaming) from enterprise systems (ERP, CRM, WMS, IoT).
Apply transformations, masking, and validations for regulatory compliance (e.g., HIPAA, GDPR).
2. ETL/ELT & Data Processing
- Develop ETL/ELT workflows using tools like Airflow or Spark.
Work with ML/AI teams to structure data for analytics, simulations, and LLM-powered use cases.
3. Multi-Database Storage
- Design and optimize data storage across:
- Relational (e.g., PostgreSQL)
- Analytical (e.g., Snowflake, BigQuery)
- Graph (e.g., Neo4j)
- Vector (e.g., Pinecone, Milvus)
- Align storage design with specific workloads for maximum efficiency.
4. Governance & Data Quality - Implement data quality checks, lineage, metadata management, and MDM practices.
Apply secure data handling and role-based access control.
5. Performance & Scalability
- Monitor and optimize pipelines for latency, throughput, and reliability.
Apply best practices in distributed and streaming data processing (e.g., Kafka, Spark, Flink).
6. Collaboration & Documentation
- Work with DevOps, Data Science, and AI/ML teams to align pipelines with product needs.
- Maintain clear documentation of data flows and governance policies.
Qualifications:
- 5+ years in data engineering, ETL/ELT pipeline development, data lakes, or streaming systems.
- Strong experience with Python (incl. FastAPI) and SQL.
- Experience with cloud platforms, primarily GCP, and some AWS.
- Hands-on with tools like Kafka, Airflow, Spark.
- Familiar with relational, analytical, NoSQL, graph, and vector databases.
- Understanding of metadata, lineage, MDM, and data compliance (GDPR, HIPAA).
- Strong grasp of data security, encryption, and access control.
- Excellent communication and cross-functional collaboration skills.
Why This Role
Impact: Contribute to building a core data foundation for a powerful AI platform.
Growth: Gain experience across streaming, multi-DB architectures, and AI-focused data use cases.
Collaboration: Work alongside data scientists, engineers, and AI researchers to create automated, intelligent solutions.
We offer:
- Competitive compensation based on experience and skills.
- Flexible working hours and remote work environment.
- Opportunities for professional growth and development.
- Collaborative and innovative team culture.
- Participation in exciting and challenging projects.
This position offers the opportunity to shape a robust data ecosystem for a cutting-edge AI platform, ensuring the quality and reliability of data that underpins analytics, process intelligence, and decision automation.