Data Scientist, NPL engineer
Experience
For the past 2.5 years I have worked in US IT startups as Data Scientist (NLP field) with trade and crime data.
- NER model with Prodigy package (data preparation, labeling, modeling, error analysis, iterate)
- Geocoder (Google APIs, Pelias)
- Supervised models (xgboost, bi-lstm, transformers) for free text comparation
- Unsupervised models / custom systems for free text clustering
- A lot of tabular data (SQL)
- Cloud services (AWS, Databricks)
Skills
Python Machine Learning SQL Data Science Pandas numpy Jupyter Notebook Git scikit-learn statistics matplotlib English PyTorch NLP Jira Excel AWS Recurrent Neuronal Networks Data Analysis Scientific Research
Highlights
- Successfully developed from end-to-end NER pipeline with help of Prodigy tool for annotations with human level performance, resulting in enhanced company data and user experience and increased customer satisfaction significantly.
- Implemented geocoding pipeline from scratch using Google’s places and addresses api, created voting system as part of pipeline for selection best of multiple geocoding api outputs.
- Combined the two above projects to create ML driven system that takes news (media articles) as input and outputs location (lat-lon pair), time and actors of crime event.
- Spent significant amount of time developing tools for automatic data insight and statistics generation to make user experience more pleasant and insightful.
Preferred language
Українська, English
$2000 / mo
- Ukraine, Lviv
- 3 years of experience
- English: Advanced/Fluent
- Remote work
- Office
- Published 28 March 2024
- Typically replies in: 3 days
- Response rate 72%