Senior Data Engineer

About the Platform

We’re building a unified data ecosystem that connects raw data, analytical models, and intelligent decision layers.
The platform combines the principles of data lakes, lakehouses, and modern data warehouses — structured around the Medallion architecture (Bronze / Silver / Gold).
Every dataset is versioned, governed, and traceable through a unified catalog and lineage framework.
This environment supports analytics, KPI computation, and AI-driven reasoning — designed for performance, transparency, and future scalability. (in Partnership with GCP, OpenAI, Cohere)

What You’ll Work On

1. Data Architecture & Foundations

Design, implement, and evolve medallion-style data pipelines — from raw ingestion to curated, business-ready models.
Build hybrid data lakes and lakehouses using Iceberg, Delta, or Parquet formats with ACID control and schema evolution.
Architect data warehouses that unify batch and streaming sources into a consistent, governed analytics layer.
Ensure optimal partitioning, clustering, and storage strategies for large-scale analytical workloads.

2. Data Ingestion & Transformation

Create ingestion frameworks for APIs, IoT, ERP, and streaming systems (Kafka, Pub/Sub).
Develop reproducible ETL/ELT pipelines using Airflow, dbt, Spark, or Dataflow.
Manage CDC and incremental data loads, ensuring freshness and resilience.
Apply quality validation, schema checks, and contract-based transformations at every stage.

3. Governance, Cataloging & Lineage

Implement a unified data catalog with lineage visibility, metadata capture, and schema versioning.
Integrate dbt metadata, OpenLineage, and Great Expectations to enforce data quality.
Define clear governance rules: data contracts, access policies, and change auditability.
Ensure every dataset is explainable and fully traceable back to its source.

4. Data Modeling & Lakehouse Operations

Design dimensional models and business data marts to power dashboards and KPI analytics.
Develop curated Gold-layer tables that serve as trusted sources of truth for analytics and AI workloads.
Optimize materialized views and performance tuning for analytical efficiency.
Manage cross-domain joins and unified semantics across products, customers, or operational processes.

5. Observability, Reliability & Performance

Monitor data pipeline health, freshness, and cost using modern observability tools (Prometheus, Grafana, Cloud Monitoring).
Build proactive alerting, anomaly detection, and drift monitoring for datasets.
Implement CI/CD workflows for data infrastructure using Terraform, Helm, and ArgoCD.
Continuously improve query performance and storage efficiency across warehouses and lakehouses.

6. Unified Data & Semantic Layers

Help define a unified semantic model that connects operational, analytical, and AI-ready data.
Work with AI and analytics teams to structure datasets for semantic search, simulation, and reasoning systems.
Collaborate on vectorized data representation and process-relationship modeling (graph or vector DBs).

What We’re Looking For

5+ years of hands-on experience building large-scale data platforms, warehouses, or lakehouses.
Strong proficiency in SQL, Python, and distributed processing frameworks (PySpark, Spark, Dataflow).
Deep understanding of Medallion architecture, data modeling, and modern ETL orchestration (Airflow, dbt).
Experience implementing data catalogs, lineage tracking, and validation frameworks.
Knowledge of data governance, schema evolution, and contract-based transformations.
Familiarity with streaming architectures, CDC patterns, and real-time analytics.
Practical understanding of FinOps, data performance tuning, and cost management in analytical environments.
Strong foundation in metadata-driven orchestration, observability, and automated testing.
Bonus: experience with ClickHouse, Trino, Iceberg, or hybrid on-prem/cloud data deployments.

You’ll Excel If You

Think of data systems as living, evolving architectures — not just pipelines.
Care deeply about traceability, scalability, and explainability.
Love designing platforms that unify data across analytics, AI, and process intelligence.
Are pragmatic, hands-on, and focused on building systems that last.

Required skills experience

SQL
Python
Data Warehouse
GCP (Google Cloud Platform)
Apache Kafka

Required languages

English

B2 - Upper Intermediate

SQL, Python, Data Warehouse, GCP, Apache Kafka

Published 9 October · Updated 10 November

Statistics:

136 views

20 applications

34% read

To apply for this and other jobs on Djinni login or signup.

Only from 5 years of experience
Full Remote
Countries of Europe or Ukraine
Countries where we consider candidates
English B2 - Upper Intermediate

Data Engineer

SQL
Python
Data Warehouse

+ 2 more

Employment: Part-time
Domain: SaaS
Product

Apply for the job

34% read

0% responded

📊 $4000-6000 Average salary range of similar jobs in analytics →