Senior Data Engineer

We’re seeking an experienced Senior Data Engineer to join a healthcare project and build, maintain, and evolve the data infrastructure that powers an AI-driven healthcare platform.

The role focuses on designing robust data pipelines, managing a centralized data lake architecture using AWS Lake Formation, and ensuring high-quality processing of both structured and unstructured healthcare data. You’ll work closely with data science, ML engineering, and backend teams to deliver scalable, secure, and compliant data solutions for aesthetic medicine applications.

 

Responsibilities:

  • Design and implement scalable data pipelines for diverse healthcare data sources using AWS services;
    Build and maintain a centralized data lake using AWS Lake Formation for secure storage of structured and unstructured medical data;
  • Develop data ingestion, transformation, and processing workflows for multimodal healthcare data, including medical images, clinical documentation, and practice data;
  • Implement preprocessing pipelines for unstructured data using tools such as Bedrock Data Automation and LlamaIndex;
  • Build and maintain ETL/ELT processes with proper data governance and security controls;
  • Implement data quality monitoring systems and validation frameworks;
  • Support RAG system implementation with optimized data storage and retrieval mechanisms;
  • Develop and maintain data crawlers for collecting domain-specific medical content;
  • Ensure HIPAA compliance across all data handling and processing workflows;
  • Collaborate with data scientists and ML engineers to provide high-quality data for model training and AI features.
     

Required Qualifications:

  • 4+ years of experience in data engineering roles;
  • Strong experience with AWS data services (S3, Glue, Lake Formation, Athena, EMR);
  • Proficiency in Python, SQL, and data processing frameworks;
  • Experience with data lakehouse architectures and ETL pipeline development;
  • Strong background in managing unstructured data pipelines and preprocessing workflows;
  • Experience with AWS analytics services (Glue Catalog, Glue ETL, Athena);
  • Knowledge of data quality frameworks (Great Expectations, Glue Data Quality);
  • Familiarity with vector databases and embedding generation for LLMs;
  • Understanding of data security and HIPAA compliance requirements;
  • Experience with data orchestration tools (Dagster, Airflow, AWS MWAA);
  • At least an Upper Intermediate level of spoken and written English.

     

Preferred Qualifications:

  • Experience with Apache Iceberg table format for data lakehouse organization;
  • Experience using Dagster for data orchestration;
  • Hands-on experience with AWS CDK for Infrastructure as Code;
  • Background in preprocessing data for LLM applications (text extraction, semantic chunking);
  • Experience with real-time data streaming architectures;
  • Familiarity with healthcare data structures and medical terminology;
  • Experience with multi-account AWS data governance;
  • Background in healthcare data engineering or HIPAA-compliant systems.

     

IT Craft offers:

  • Competitive compensation according to the qualifications;
  • Flexible working hours, remote work;
  • Opportunity for career growth;
  • The reward for sport activities;
  • In-house English training;
  • A friendly team of open-minded people.

Please send your CV.

By submitting your application, you consent to the processing of your personal data in accordance with IT Craft's Privacy Policy, available at https://itechcraft.com/datenschutz/.

Required languages

English B2 - Upper Intermediate
Published 2 January
21 views
·
2 applications
To apply for this and other jobs on Djinni login or signup.
Loading...