Data Engineer (Automation, Data Parsing) (offline)

Requirements:
- Strong knowledge of scraping and parsing practices
- Strong knowledge of Python
- Javascript knowledge is preferable
- Mongo, PostgreSQL
- Celery
- OOP
- Linux
- Git
- REST API
- pet projects on Github/Gitlab
- organized, responsible, and fast-learning person

Do not hesitate to apply if you are missing specific experience.

We Offer
- WFH and flexible working hours
- Zero bureaucracy

Responsibilities:
- Design and implement web scraping solutions using tools like
- Develop and maintain web scraping scripts and pipelines for data extraction..
- Optimize scraping processes for efficiency and scalability.
- Monitor and troubleshoot scraping processes to ensure reliability.
- Work closely with cross-functional teams to understand data requirements and deliver timely results.

Project description:
The first project that that you are going to work is automation of the process of parsing and processing posts from public social media groups, which consists of the following steps:

1. Parsing posts from social networks into a Google Sheet.
2. Processing (data enrichment) of the received posts using chatGPT, adding results from chatGPT to a Google Sheet.
3. Geocoding (getting geocoordinates from an address) using a geoservice like https://positionstack.com and recording the obtained coordinates in a Google Sheet.
4. Duplicate analysis (retrieving existing announcements from another website via that site API and comparing them with the newly received ones from social networks and removing duplicates).
5. Moderator review (allowing a moderator to mark unsuitable posts).
6. Publication of announcements on other site via that site API (the script must send new posts for publication API of that site).
7. Writing in the resulting file the link to the created announcement (update the Google Sheet with links to the published posts).

Please, describe how would you approach this project, what tools or platforms you would use to complete it. Add anything else that you think is important for us to know.
Have you worked on a similar projects? Please, describe them.

About Valid-X

We are a growing company with interesting projects, a fun work environment, great perks and flexibility. Valid-X is known for its transparent approach to software development. Check out our job openings and send us your CV.

Company website:
http://valid-x.com/

DOU company page:
https://jobs.dou.ua/companies/valid-x/

The job ad is no longer active

Look at the current jobs Data Engineer →