Senior Data Engineer
About the Platform
We’re building a unified data ecosystem that connects raw data, analytical models, and intelligent decision layers.
The platform combines the principles of data lakes, lakehouses, and modern data warehouses — structured around the Medallion architecture (Bronze / Silver / Gold).
Every dataset is versioned, governed, and traceable through a unified catalog and lineage framework.
This environment supports analytics, KPI computation, and AI-driven reasoning — designed for performance, transparency, and future scalability. (in Partnership with GCP, OpenAI, Cohere)
What You’ll Work On
1. Data Architecture & Foundations
- Design, implement, and evolve medallion-style data pipelines — from raw ingestion to curated, business-ready models.
- Build hybrid data lakes and lakehouses using Iceberg, Delta, or Parquet formats with ACID control and schema evolution.
- Architect data warehouses that unify batch and streaming sources into a consistent, governed analytics layer.
- Ensure optimal partitioning, clustering, and storage strategies for large-scale analytical workloads.
2. Data Ingestion & Transformation
- Create ingestion frameworks for APIs, IoT, ERP, and streaming systems (Kafka, Pub/Sub).
- Develop reproducible ETL/ELT pipelines using Airflow, dbt, Spark, or Dataflow.
- Manage CDC and incremental data loads, ensuring freshness and resilience.
- Apply quality validation, schema checks, and contract-based transformations at every stage.
3. Governance, Cataloging & Lineage
- Implement a unified data catalog with lineage visibility, metadata capture, and schema versioning.
- Integrate dbt metadata, OpenLineage, and Great Expectations to enforce data quality.
- Define clear governance rules: data contracts, access policies, and change auditability.
- Ensure every dataset is explainable and fully traceable back to its source.
4. Data Modeling & Lakehouse Operations
- Design dimensional models and business data marts to power dashboards and KPI analytics.
- Develop curated Gold-layer tables that serve as trusted sources of truth for analytics and AI workloads.
- Optimize materialized views and performance tuning for analytical efficiency.
- Manage cross-domain joins and unified semantics across products, customers, or operational processes.
5. Observability, Reliability & Performance
- Monitor data pipeline health, freshness, and cost using modern observability tools (Prometheus, Grafana, Cloud Monitoring).
- Build proactive alerting, anomaly detection, and drift monitoring for datasets.
- Implement CI/CD workflows for data infrastructure using Terraform, Helm, and ArgoCD.
- Continuously improve query performance and storage efficiency across warehouses and lakehouses.
6. Unified Data & Semantic Layers
- Help define a unified semantic model that connects operational, analytical, and AI-ready data.
- Work with AI and analytics teams to structure datasets for semantic search, simulation, and reasoning systems.
- Collaborate on vectorized data representation and process-relationship modeling (graph or vector DBs).
What We’re Looking For
- 5+ years of hands-on experience building large-scale data platforms, warehouses, or lakehouses.
- Strong proficiency in SQL, Python, and distributed processing frameworks (PySpark, Spark, Dataflow).
- Deep understanding of Medallion architecture, data modeling, and modern ETL orchestration (Airflow, dbt).
- Experience implementing data catalogs, lineage tracking, and validation frameworks.
- Knowledge of data governance, schema evolution, and contract-based transformations.
- Familiarity with streaming architectures, CDC patterns, and real-time analytics.
- Practical understanding of FinOps, data performance tuning, and cost management in analytical environments.
- Strong foundation in metadata-driven orchestration, observability, and automated testing.
- Bonus: experience with ClickHouse, Trino, Iceberg, or hybrid on-prem/cloud data deployments.
You’ll Excel If You
- Think of data systems as living, evolving architectures — not just pipelines.
- Care deeply about traceability, scalability, and explainability.
- Love designing platforms that unify data across analytics, AI, and process intelligence.
- Are pragmatic, hands-on, and focused on building systems that last.
Required languages
English | B2 - Upper Intermediate |