Senior/Lead Data Engineer
We are seeking an experienced Data Engineer who will be responsible for supporting complex or leading singular projects related to data engineering requirements and initiatives, supporting data projects from across the business, including Clinical, Pre-Clinical, Non-Clinical, Chemistry, RWD, and Omics.
Essential Functions
• Support the design, development, and maintenance of data pipelines for processing Research and Development data from diverse sources (Clinical Trials, Medical Devices, Pre-Clinical, Omics, Real World Data) utilizing the AWS technology platform.
• Create and optimize ETL/ELT processes for structured and unstructured data using Python, R, SQL, AWS services, and other tools.
• Build and maintain data repositories using AWS S3 and FSx technologies. Establish data warehousing solutions using Amazon Redshift.
• Build and maintain standard data models.
• Develop data quality frameworks, validation processes and KPIs to ensure accuracy and consistency of data pipelines.
• Implement data versioning and lineage tracking to support data traceability, regulatory compliance and audit requirements.
• Create and maintain documentation for data processes, architectures, and workflows.
• Implement modern software development best practices (e.g. Code Versioning, DevOps, CD/CI).
• Maintain compliance with data privacy regulations such as HIPAA, GDPR.
• May be required to develop, deliver or support data literacy training across R&D.
Required Knowledge, Skills and Abilities
• Strong knowledge of data engineering tools such as Python, R and SQL for data processing.
• Strong proficiency with AWS services particularly S3, Redshift, FSx, Glue, Lambda.
• Strong proficiency with relational databases.
• Strong background in data modeling and database design.
• Familiarity with unstructured database technologies (e.g. NoSQL) and other database types (e.g. Graph).
• Familiarity with Containerization such as Docker and EKS/Kubernetes.
• Familiarity with one or more R&D research processes and associated regulatory requirements.
• Exposure to healthcare data standards (CDISC, HL7, FHIR, SNOMED CT, OMOP, DICOM).
• Exposure to big data technologies and handling.
• Knowledge of machine learning operations (MLOps) and model deployment.
• Strong problem-solving and analytical abilities.
• Excellent communication and collaboration skills.
• Experience working in an Agile development environment.
Minimum Requirements
• Bachelor’s Degree in Computer Science, Statistics, Mathematics, Life Sciences, or other relevant scientific fields; Master’s Degree preferred.
• 5+ years of experience in data engineering, with at least 1.5 years focusing on healthcare, research, or clinical-related data.
What Do We Offer?
- Remote-first work environment with flexible hours.
- Access to professional development resources: courses, workshops, and certifications.
- A supportive, inclusive team culture that encourages innovation and open communication.
- Competitive salary with annual market-based adjustments and performance-based bonuses.
- Comprehensive health insurance and wellness benefits.
- The latest tools, hardware, and software needed to do your best work.
- Possible army reservation based on internal queue
Required skills experience
| Data Engineering | 4 years |
| Python | 3.5 years |
| R | 2 years |
| AWS S3 | 2.5 years |
| AWS Redshift | 2.5 years |
| NoSQL | 1 year |
| SQL | 3.5 years |
Required languages
| English | B2 - Upper Intermediate |