Data Engineer (offline)

Big Data

• Experience managing big data.
• Be comfortable:
◦ Handling hundreds of TeraBytes of data in different databases across multiple services.
◦ Processing massive amounts of data in short bursts (~100 Billion items in span of ~2 days).
◦ Finding solutions to time-sensitive problems that will inevitably occur.

Comprehensive AWS experience:

• DynamoDB
◦ Experience handling DynamoDB when pushed well outside of the usual use-case/throughputs. Stackoverflow is of little use at this point, hands-on experience is needed.
◦ Understand how to take full advantage of partition/sort keys and secondary indexes (Global vs Local).
◦ Understand inner workings of the service and common pitfalls (hot-partitions, collection limits, etc)
◦ Understand throughput provisioning and its quirks (scaling limits per day, variable scale transition speeds, etc).
◦ Fully understand Dynamo pricing (there are some aspects that are not fully documented)
◦ Understand how table latencies vary depending on the activity and size of table.
• ElasticSearch
◦ Experience performing queries. Both for maintaining the current code and to be prepared for new features.
◦ Infrastructure knowledge: indexes, index aliases, shards, nodes and metric interpretation (indexing rate, search rate, search latency, etc.).
◦ Time-based indexes: SERPs cluster is adopting this paradigm. Doing queries, handling index rollovers (and potentially automating them) is important.
• S3
◦ Understand throughput limits and how to take full advantage.
◦ S3 as a data lake and how it interfaces with pipeline.
◦ Storage tiers and their pros and cons
• SQS
• EC2
• ECS
• Batch
◦ Be able to run thousands of parallel batch jobs.
• CloudWatch

Architecture Design

• Ability to create infrastructures that must:
◦ Handle big-data throughputs
◦ Scale
◦ Perform their tasks in reasonable amount of time
◦ Perform their task as cheaply as possible (it's easy to end up with five-digit bills if not careful)
• Understanding strengths and weaknesses of each service and know when to use which.
• Understand AWS services costs across the board.

Nice to Have

Would be good to have these complimentary skill set, we’ll need to figure out long term how to fulfill these areas even though ML / DS is not yet a full time roll.

Machine Learning

• Experience with optimization problems with complex requirements / customized training environments.
• ML on tabular data (NLP will most probably be neeeded at some point).
• Experience with Random Forest, GradientBoosting and NN classifiers/regressors.
• Experience with scikit-learn. PyTorch also appreciated.
• Understand performance metrics (Precision, Recall, F1-Score, MAPE, MSE, RMSE) and when to use them.

Data Science

• Needed for creating clean datasets for the ML models.
• Experience with Pandas or equivalent framework.
• Be able to manage noisy data and turn it into usable datasets.
• Google Search Console and Google Analytics.

Industry experience / knowledge

• Understand the SEO world
• Deep understanding of the domain in question (SEO metrics) is needed to be able to apply ML successfully.

About CHI Software

Наша компания работает на рынке с 2004 года. Основное направление - разработка решений в области электронной коммерции и информационных порталов на базе технологии PWA (Progressive Web Applications).
Технологический стек:
PHP, JavaScript, NodeJS, ReactJS, Magento, Drupal

Company website:
http://wdg-company.com/

DOU company page:
https://jobs.dou.ua/companies/wise/

The job ad is no longer active
Job unpublished on 4 March 2021

Look at the current jobs Data Science Kyiv→