We are starting an internal product related to Information Retrieval from semi-structured texts. In short, we want to build a smart parser that will be able to extract job post-related information from HTML pages, similar to zyte service. And looking for a NLP Data Scientist to complete our team.

What we offer?
Fully remote position with flexible working hours (40 hours per week)
Small team of professionals (PM, BA, Software Engineer, Product Owner, Data Science Consultant from Germany)
Ability to influence our growing company
The freedom and support in developing your own ideas
Full financial and legal support for private entrepreneurs
Low hierarchy and open communication with founders
Paid vacation, day-offs (20 days)
All Ukrainian public holidays are days off
Compensation for sports activities
Whom are we looking for?
Experienced Python developer with at least 3+ years of Python programming experience
Experience with information extraction tasks using heuristics, pattern rules or NLP methods such as TF-IDF, NER, or similar -> At least 1-2 years of NLP experience
Experience training and deploying NLP models for text classification and feature presentation, preferably with deep neural networks
Experience processing messy text data, especially HTML and fragments in Python
Experience operating ML models for production inference -> Wrapping in APIs, monitoring, scaling, etc.
Ability to break down large complex problems and resolving ambiguity by asking the right questions to stakeholders

Technologies Checklist
Experienced with at least one of Pytorch, Keras, Tensorflow for low-level training/inference of deep neural networks in Python
Experience training and using deep NLP models for text classification and/or named-entity recognition (for example BERT, T5, Universal Sentence Encoder, RoBERTa, XLM, etc. ) with higher-level libraries like Huggingface Transformers, Spacy, or similar
Experience with some NLP libraries for text processing in Python: NLTK, Spacy, Gensim, Textblob, FastText
Generic machine-learning skills for model evaluation etc: Scikit-learn

BeautifulSoup, Scrapy for scraping and parsing text data
Experience using X-Path, CSS Selectors, jQuery, or similar for navigating HTML documents
At least one of Flask, Sanic, Falcon, FastAPI for creating web API services in Python
Regexes for rule-based text processing
Experience with Approximate Nearest Neighbour libraries like Annoy, ScaNN, or Faiss for efficient data retrieval

Project description

At the moment we are maintaining the scraping project. It gathers job posts information from approximately 600 websites. For each website, we developed a separate web scraper.

We want to reduce the maintenance time and build a single scraper for all 600 websites applying AI algorithms. So, basically, we want to send an HTML page to an algorithm and receive a JSON with a list of populated fields.

About DataOx

We rebranded to DataOx in spring 2020!
We're a software development company focused on the best services for the business. We try not just to carry out tasks, but also thoroughly understand the client's business needs, that allows implementing truly valuable software solutions.

Our Services:

• Web Scraping Services;
• Custom Software Development;
• Data Management Systems;
• Custom Software Development.

If you have solid knowledge and experience with Java, Spring, Python welcome to our team!

Company website:

DOU company page:

Job posted on 3 September 2021
16 views    1 application

To apply for this and other jobs on Djinni login or signup.