Senior Data Engineer Offline

MEV

We are building a greenfield MVP for a healthcare analytics platform focused on patient-level insights from large-scale pharmacy claims data. The platform is being developed by a newly formed client with existing client interest and will be used for real-time patient analytics at the point of service.

All data is ingested via Snowflake Share from external vendors (no ingestion layer needed) and processed through a typical ETL pipeline to create a final patient-level dataset (~300M rows). This normalized output will be loaded into a PostgreSQL database (or comparable RDBMS; final tooling to be confirmed) and served via a low-latency REST API.

Key pipeline stages include:

Standardization (cleansing, mapping, enrichment using BIN/PCN lookups)
Projection and extrapolation using a simple classification model or proximity search
Summarization to per-patient records

The data is updated weekly (batch-based system). We are not building the ML model but must integrate with it and support its output. The system will initially serve two core API endpoints:

Given patient info, return plan/coverage info with confidence score
Given patient info, return medical history

You will be part of a lean, senior-level engineering team and expected to own key parts of the ETL and data modeling effort.

Key Responsibilities

Build performant and scalable ETL pipelines in Snowflake, transforming wide raw claims datasets into normalized outputs
Apply cleansing, mapping, enrichment logic (e.g., payer enrichment via BIN/PCN lookups)
Collaborate on projection/extrapolation workflows, integrating with classification models or rules-based engines
Load processed outputs into PostgreSQL to power real-time REST API endpoints
Tune Snowflake queries for cost-efficiency and speed, optimize workloads for batch processing (~weekly cadence)
Work closely with the API engineer to ensure alignment between data schema and API needs
Ensure data privacy, compliance, and PHI de-identification coordination (with Datavant)
Contribute to architectural decisions and the implementation roadmap in a fast-moving MVP cycle

Requirements

5+ years in data engineering or data platform development roles
Advanced SQL skills and experience working with wide, high-volume datasets (e.g., 100M+ rows)
Experience with Snowflake or readiness to quickly ramp up on it, including performance tuning and familiarity with native features (streams, tasks, stages)
Proficiency in Python for scripting, orchestration, and integration
Experience working with batch pipelines and familiar with best practices for data warehousing
Solid understanding of ETL design patterns and ability to work independently in a small, fast-paced team
Awareness of data compliance standards (HIPAA, PHI de-identification workflows)

Preferred Qualifications

Experience with Snowpark (Python) or other in-Snowflake processing tools
Familiarity with payer enrichment workflows or healthcare claims data
Previous use of classification models, vector similarity, or proximity-based data inference
Hands-on experience with AWS EC2, S3 and integrating cloud resources with Snowflake
Exposure to PostgreSQL and API integration for analytic workloads

The job ad is no longer active

Look at the current jobs Data Engineer →

Only from Upper-Intermediate
Only from 5 years of experience
Full Remote
Countries of Europe or Ukraine
Countries where we consider candidates

Data Engineer
Snowflake, Python, ETL, PostgreSQL, SQL, AWS

Domain: Healthcare / MedTech
Outsource

Apply for the job

📊 $4000-7000 Average salary range of similar jobs in analytics →