Lead Data Engineer $$$$ Offline

A Lead Data Engineer role focused on building the data backbone behind large-scale scientific knowledge systems - including ingestion, transformation, semantic modeling, and high-efficiency access layers.
 

This position centers around the creation of an AI-ready knowledge graph, connecting research papers, ontology data, experts, and institutions through scalable pipelines and vector-based retrieval.


Role Objective

Ownership of data architecture and pipeline scalability, with direct responsibility for designing a fast, reliable, and extensible system capable of processing scientific data at scale. The role involves shaping technical direction, implementing distributed ingestion workloads, and enabling AI-driven insights through modern graph and vector data technologies.
 

Key Responsibilities

  • Design and maintain scalable, end-to-end data pipelines for ingestion and processing
  • Build and optimize a scientific knowledge graph and related taxonomies
  • Implement semantic search and vector indexing workflows
  • Create and evolve a unified data access layer for internal consumption
  • Leverage AI tools for acceleration across ideation, automation, and delivery
  • Apply a fast-iteration delivery approach with strong documentation habits
  • Work cross-functionally with AI, engineering, and product stakeholders
     

Technical Requirements

  • Strong Python skills for ETL, manipulation, and graph workflows
  • Experience with distributed computation (Spark, Dask, Polars, or equivalents)
  • Knowledge graph design and graph databases (Neo4j, RDFLib, property graphs)
  • Familiarity with vector database technologies (FAISS, Pinecone, Qdrant, Weaviate)
  • Experience with ETL orchestration (Airflow, Dagster, dbt, or custom)
  • Ability to work with formats including Parquet, JSONL, CSV, RDF, Turtle
  • Experience working with public research data APIs (OpenAlex, ORCID, PubMed) is a plus
     

Expected Approach & Mindset

  • System-level thinking with a preference for simplicity and speed
  • Independent work discipline with clear communication practices
  • Comfort working with incomplete, inconsistent scientific datasets
  • Execution-driven mindset - shipping over perfection
  • Ability to collaborate across disciplines without heavy process overhead
     

Work Philosophy

This role suits engineers who enjoy technical ownership, autonomy, and building modern, AI-integrated data systems from the ground up. The environment favors curiosity, pragmatic decision-making, high transparency, and rapid iteration cycles instead of rigid process structures.


Why This Role Matters

The resulting platform directly contributes to accelerating scientific discovery by transforming raw research data into searchable, interconnected knowledge accessible through modern AI systems. The impact is measurable, broad, and meaningful.

Required languages

English C1 - Advanced

The job ad is no longer active

Look at the current jobs Data Engineer →

Loading...