- Experience with SQL (MySQL) databases and handling large amounts of data
- Comfortable working from the terminal in Linux/Unix (Ubuntu)
- Good knowledge of at least one programming language (Ruby, Python etc.)

About the vacancy

The client is an international company that provides an online genealogy service that helps its clients understand their past and family history.

We are looking for a Data Engineer who will join a team working on the maintenance of the data workflow and ingestion of scanned newspaper image data. This involves handling a lot of data throughput in a reliable and consistent way.

The specialist will help the existing team to manage the file systems, databases, and data ingestion into Solr, as well as managing internal, web-based tools that the client’s Quality Control team uses to validate images before they are published.

There is also an element of DevOps and Systems Administration - the team works with a significant number of physical and virtual servers, handling deployment pipelines, etc.

In the coming months, the client will be investigating an option to include Machine Learning techniques as part of a process to improve the quality of their OCR. There is a likelihood that they will apply some ML techniques over the course of this project, but this is likely only to constitute a part of the role.

There are multiple teams consisting of 5-7 people. The teams include DataArt engineers and stakeholders from the client side working in a mature Agile environment.

We hire people not for a project but for the company. If the project (or your work on it) is over you go to another project or to a paid “Idle”.

Responsibilities
- Managing file systems; managing databases; managing data ingest into Solr and managing Solr at scale
- Handling large amounts of XML
- Management of internal, web-based tools
- Potential to use ML techniques as a part of the process of improving the quality of their OCR, possibly after a few months

Must have
- Experience with SQL (MySQL) databases and handling large amounts of data
- Comfortable working from the terminal in Linux/Unix (Ubuntu)
- Good knowledge of at least one programming language (Ruby, Python etc.)
- A hands-on approach to getting stuff done
- A curiosity to learn and widen your skillset
- Rails (for internal web-based tools)
- Experience with ZFS, XML
- Tensorflow (not extensively so far – used for ML work)
- AWS/Azure (used from time to time)
- Experience with Apache Solr

Would be a plus
- Focus on quality, with testing experience and a willingness to pair collaboratively
- Background in DevOps/Systems Administration
- Experience with Docker, Git, Kubernetes
- Experience with XML processing
- Working knowledge of, or an interest in image data processing

Learn more about our policy of equal opportunities in employment

About DataArt

DataArt с 1997 года проектирует, разрабатывает, модернизирует и поддерживает IT-решения в области финансов, здравоохранения, телекома, туризма, медиа и интернета вещей.

Клиенты DataArt — компании с уникальными бизнес-моделями — находятся в США и Великобритании, а распределенные команды специалистов — в центрах разработки в России, Украине, Польше, Аргентине и Болгарии.

Company website:
https://dataart.ua

DOU company page:
https://jobs.dou.ua/companies/dataart/

Job posted on 15 July 2021
0 views    1 response


Для отклика на эту и другие вакансии на Джинне войдите или зарегистрируйтесь.