Senior Data Engineer

About the Platform

We’re building a unified data ecosystem that connects raw data, analytical models, and intelligent decision layers.
The platform combines the principles of data lakes, lakehouses, and modern data warehouses — structured around the Medallion architecture (Bronze / Silver / Gold).
Every dataset is versioned, governed, and traceable through a unified catalog and lineage framework.
This environment supports analytics, KPI computation, and AI-driven reasoning — designed for performance, transparency, and future scalability. (in Partnership with GCP, OpenAI, Cohere)
 

What You’ll Work On

1. Data Architecture & Foundations

  • Design, implement, and evolve medallion-style data pipelines — from raw ingestion to curated, business-ready models.
  • Build hybrid data lakes and lakehouses using Iceberg, Delta, or Parquet formats with ACID control and schema evolution.
  • Architect data warehouses that unify batch and streaming sources into a consistent, governed analytics layer.
  • Ensure optimal partitioning, clustering, and storage strategies for large-scale analytical workloads.

2. Data Ingestion & Transformation

  • Create ingestion frameworks for APIs, IoT, ERP, and streaming systems (Kafka, Pub/Sub).
  • Develop reproducible ETL/ELT pipelines using Airflow, dbt, Spark, or Dataflow.
  • Manage CDC and incremental data loads, ensuring freshness and resilience.
  • Apply quality validation, schema checks, and contract-based transformations at every stage.

3. Governance, Cataloging & Lineage

  • Implement a unified data catalog with lineage visibility, metadata capture, and schema versioning.
  • Integrate dbt metadata, OpenLineage, and Great Expectations to enforce data quality.
  • Define clear governance rules: data contracts, access policies, and change auditability.
  • Ensure every dataset is explainable and fully traceable back to its source.

4. Data Modeling & Lakehouse Operations

  • Design dimensional models and business data marts to power dashboards and KPI analytics.
  • Develop curated Gold-layer tables that serve as trusted sources of truth for analytics and AI workloads.
  • Optimize materialized views and performance tuning for analytical efficiency.
  • Manage cross-domain joins and unified semantics across products, customers, or operational processes.

5. Observability, Reliability & Performance

  • Monitor data pipeline health, freshness, and cost using modern observability tools (Prometheus, Grafana, Cloud Monitoring).
  • Build proactive alerting, anomaly detection, and drift monitoring for datasets.
  • Implement CI/CD workflows for data infrastructure using Terraform, Helm, and ArgoCD.
  • Continuously improve query performance and storage efficiency across warehouses and lakehouses.

6. Unified Data & Semantic Layers

  • Help define a unified semantic model that connects operational, analytical, and AI-ready data.
  • Work with AI and analytics teams to structure datasets for semantic search, simulation, and reasoning systems.
  • Collaborate on vectorized data representation and process-relationship modeling (graph or vector DBs).


What We’re Looking For

  • 5+ years of hands-on experience building large-scale data platforms, warehouses, or lakehouses.
  • Strong proficiency in SQL, Python, and distributed processing frameworks (PySpark, Spark, Dataflow).
  • Deep understanding of Medallion architecture, data modeling, and modern ETL orchestration (Airflow, dbt).
  • Experience implementing data catalogs, lineage tracking, and validation frameworks.
  • Knowledge of data governance, schema evolution, and contract-based transformations.
  • Familiarity with streaming architectures, CDC patterns, and real-time analytics.
  • Practical understanding of FinOps, data performance tuning, and cost management in analytical environments.
  • Strong foundation in metadata-driven orchestration, observability, and automated testing.
  • Bonus: experience with ClickHouse, Trino, Iceberg, or hybrid on-prem/cloud data deployments.


You’ll Excel If You

  • Think of data systems as living, evolving architectures — not just pipelines.
  • Care deeply about traceability, scalability, and explainability.
  • Love designing platforms that unify data across analytics, AI, and process intelligence.
  • Are pragmatic, hands-on, and focused on building systems that last.

Required languages

English B2 - Upper Intermediate
SQL, Python, Data Warehouse, GCP, Apache Kafka
Published 9 October
40 views
·
4 applications
75% read
·
25% responded
Last responded yesterday
To apply for this and other jobs on Djinni login or signup.
Loading...