Data Scientist (offline)

Project Description:
The data engineer in the products team is responsible for expanding and optimizing our data and data pipeline architecture, as well as optimizing data flow and transformations.

We are looking for data engineer bringing expertise in two or more of the following areas:
• Building reliable data pipelines using Spark / Python / R
• Provisioning data / providing access, building REST/similar APIs back-ended by technologies such as PostgreSQL/ElasticSearch/S3/...
• Clinical trial datasets (SDTM/ADAM/...)
• Biological datasets (e.g. omics; DNA/RNA)
Responsibilities:
- Design, create, test, and maintain optimal data pipeline architecture to ensure that it supports the requirements of the stakeholders
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources
- Develop data set processes for data modelling, mining, and production to deliver data
- Delivery of clear, maintainable, and well-tested code in a timely manner
- Identify, design, and implement internal process improvements; automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Create data tools/scripts for data curators and data scientists as needed
- Collaboration with scientists, data managers, and technology teams to document and leverage their understanding of the data
- Actively participate in agile work practices
- Adopt and improve on the strong engineering practices followed by technology teams
- Analyze analytical blueprints to identify technical gaps and build best practices
Mandatory Skills:
- Computational/quantitative background
- Degree in a computational science (e.g. computer science, physics, engineering)
- Software engineering experience (versioning, Scrum, testing, collaboration good practices)
- R Programming Language OR Apache Spark OR Python
- REST API
- SQL
Nice-to-Have Skills:
- PostgreSQL
- Elasticsearch
- Amazon S3

Additional skills in two or more of the following areas are highly desirable:
- Strong experience with Python for pipelines, engineering practices, (e.g. Python + Spark; Dask, Snakemake etc)
- Some experience with R for data analysis (SparkR/sparklyr, tidyverse, mlr)
- Experience with computational environments for large-scale processing (high-performance computing and/or Spark)
- Analysis of clinical trial data, including understanding of data formats, processes

About Luxoft

Luxoft is a high-end application outsourcing provider of choice and a trusted technology advisor to Global 2000 and medium-sized growth companies that apply compelling technologies to obtain leadership positions in their respective markets.
Luxoft today finds the Best talents, proposes career growth & employment benefits. Our teams are involved in high complicity & innovative projects for the Top leaders companies around the Globe.

Company website:
https://career.luxoft.com/locations/ukraine/

DOU company page:
https://jobs.dou.ua/companies/luxoft/

The job ad is no longer active
Job unpublished on 2 July 2021

Look at the current jobs Data Science Kyiv→