Senior Data Engineer Offline
We are building a greenfield MVP for a healthcare analytics platform focused on patient-level insights from large-scale pharmacy claims data. The platform is being developed by a newly formed client with existing client interest and will be used for real-time patient analytics at the point of service.
All data is ingested via Snowflake Share from external vendors (no ingestion layer needed) and processed through a typical ETL pipeline to create a final patient-level dataset (~300M rows). This normalized output will be loaded into a PostgreSQL database (or comparable RDBMS; final tooling to be confirmed) and served via a low-latency REST API.
Key pipeline stages include:
- Standardization (cleansing, mapping, enrichment using BIN/PCN lookups)
- Projection and extrapolation using a simple classification model or proximity search
Summarization to per-patient records
The data is updated weekly (batch-based system). We are not building the ML model but must integrate with it and support its output. The system will initially serve two core API endpoints:
- Given patient info, return plan/coverage info with confidence score
Given patient info, return medical history
You will be part of a lean, senior-level engineering team and expected to own key parts of the ETL and data modeling effort.
Key Responsibilities
- Build performant and scalable ETL pipelines in Snowflake, transforming wide raw claims datasets into normalized outputs
- Apply cleansing, mapping, enrichment logic (e.g., payer enrichment via BIN/PCN lookups)
- Collaborate on projection/extrapolation workflows, integrating with classification models or rules-based engines
- Load processed outputs into PostgreSQL to power real-time REST API endpoints
- Tune Snowflake queries for cost-efficiency and speed, optimize workloads for batch processing (~weekly cadence)
- Work closely with the API engineer to ensure alignment between data schema and API needs
- Ensure data privacy, compliance, and PHI de-identification coordination (with Datavant)
Contribute to architectural decisions and the implementation roadmap in a fast-moving MVP cycle
Requirements
- 5+ years in data engineering or data platform development roles
- Advanced SQL skills and experience working with wide, high-volume datasets (e.g., 100M+ rows)
- Experience with Snowflake or readiness to quickly ramp up on it, including performance tuning and familiarity with native features (streams, tasks, stages)
- Proficiency in Python for scripting, orchestration, and integration
- Experience working with batch pipelines and familiar with best practices for data warehousing
- Solid understanding of ETL design patterns and ability to work independently in a small, fast-paced team
Awareness of data compliance standards (HIPAA, PHI de-identification workflows)
Preferred Qualifications
- Experience with Snowpark (Python) or other in-Snowflake processing tools
- Familiarity with payer enrichment workflows or healthcare claims data
- Previous use of classification models, vector similarity, or proximity-based data inference
- Hands-on experience with AWS EC2, S3 and integrating cloud resources with Snowflake
- Exposure to PostgreSQL and API integration for analytic workloads
The job ad is no longer active
Look at the current jobs Data Engineer →