Senior Data Engineer Offline

We are building a greenfield MVP for a healthcare analytics platform focused on patient-level insights from large-scale pharmacy claims data. The platform is being developed by a newly formed client with existing client interest and will be used for real-time patient analytics at the point of service.

 

All data is ingested via Snowflake Share from external vendors (no ingestion layer needed) and processed through a typical ETL pipeline to create a final patient-level dataset (~300M rows). This normalized output will be loaded into a PostgreSQL database (or comparable RDBMS; final tooling to be confirmed) and served via a low-latency REST API.

 

Key pipeline stages include:

  • Standardization (cleansing, mapping, enrichment using BIN/PCN lookups)
  • Projection and extrapolation using a simple classification model or proximity search
  • Summarization to per-patient records

     

The data is updated weekly (batch-based system). We are not building the ML model but must integrate with it and support its output. The system will initially serve two core API endpoints:

  1. Given patient info, return plan/coverage info with confidence score
  2. Given patient info, return medical history

     

You will be part of a lean, senior-level engineering team and expected to own key parts of the ETL and data modeling effort.

 

 

Key Responsibilities

  • Build performant and scalable ETL pipelines in Snowflake, transforming wide raw claims datasets into normalized outputs
  • Apply cleansing, mapping, enrichment logic (e.g., payer enrichment via BIN/PCN lookups)
  • Collaborate on projection/extrapolation workflows, integrating with classification models or rules-based engines
  • Load processed outputs into PostgreSQL to power real-time REST API endpoints
  • Tune Snowflake queries for cost-efficiency and speed, optimize workloads for batch processing (~weekly cadence)
  • Work closely with the API engineer to ensure alignment between data schema and API needs
  • Ensure data privacy, compliance, and PHI de-identification coordination (with Datavant)
  • Contribute to architectural decisions and the implementation roadmap in a fast-moving MVP cycle

     

Requirements

  • 5+ years in data engineering or data platform development roles
  • Advanced SQL skills and experience working with wide, high-volume datasets (e.g., 100M+ rows)
  • Experience with Snowflake or readiness to quickly ramp up on it, including performance tuning and familiarity with native features (streams, tasks, stages)
  • Proficiency in Python for scripting, orchestration, and integration
  • Experience working with batch pipelines and familiar with best practices for data warehousing
  • Solid understanding of ETL design patterns and ability to work independently in a small, fast-paced team
  • Awareness of data compliance standards (HIPAA, PHI de-identification workflows)

     

Preferred Qualifications

  • Experience with Snowpark (Python) or other in-Snowflake processing tools
  • Familiarity with payer enrichment workflows or healthcare claims data
  • Previous use of classification models, vector similarity, or proximity-based data inference
  • Hands-on experience with AWS EC2, S3 and integrating cloud resources with Snowflake
  • Exposure to PostgreSQL and API integration for analytic workloads

The job ad is no longer active

Look at the current jobs Data Engineer →

Loading...