Data Miner (Scraping) to $1000

We are looking for a motivated Junior Python developer who wants to get hands-on experience in large-scale data collection and transformation – the perfect stepping stone toward becoming a Data Engineer.

You will be part of a small scraping team that builds and maintains pipelines that extract millions of rows of public data daily, clean/normalize it, and export it as structured CSV/Parquet files for downstream analytics and ML teams.

This is a pure Python role – no Selenium, no Playwright, no Puppeteer, no headless browsers at all. We only use requests, httpx, aiohttp, proxies, and clever HTML parsing (BeautifulSoup, parsel, lxml).

Key Responsibilities

Write and maintain robust web-scraping scripts in Python using requests / httpx / aiohttp (sync + async)
Parse HTML/XML/JSON responses with BeautifulSoup, parsel, or lxml
Handle anti-bot measures the “light” way: realistic headers, human behavior sumulation
Normalize and clean scraped data (deduplication, schema enforcement, type casting, handling missing/inconsistent fields)
Export clean datasets to CSV or Parquet with proper headers and encoding
Add logging, monitoring (healthchecks, alerts on failures), and basic tests
Refactor existing scrapers for reliability and speed
Document each scraper (target site, fields extracted, known edge cases, retry strategy)

Required Skills & Experience

Strong Python (you write clean, readable code and understand async/await)
Solid understanding of HTTP (status codes, headers, cookies, sessions, redirects)
Experience with requests or httpx
Comfortable parsing HTML with BeautifulSoup or parsel
Basic knowledge of pandas for data cleaning/normalization
Experience saving data to CSV/Parquet
Git and Linux command line basics

Big Advantages

You already have personal scraping projects on GitHub (e.g., scraping e-commerce, real estate, job boards, etc.)
Experience with rotating proxies (Luminati/Bright Data, Smartproxy, custom proxy pools)
Basic async Python (aiohttp + asyncio)
Understanding of CSV edge cases (escaping, quoting, UTF-8 BOM, etc.)
Desire to eventually move into Data Engineering (Airflow, Spark, DBT, etc.) – we support that path internally

Why This Job is Perfect if You Want to Become a Data Engineer

You will touch the entire data lifecycle from raw source → clean tables every day
After 6–12 months you will naturally start building Airflow DAGs, moving data to PostgreSQL/Snowflake, writing schema migrations – direct path to mid/senior Data Engineer role
Small team = you see the impact of your code immediately
Lots of mentoring from senior data engineers

Our rounds:
Success test -> Interview with Hiring Manager -> Meet with Team Lead -> Offer.

Required skills experience

Python

6 months

Required languages

Ukrainian

Native

Published 19 November

158 views

46 applications

100% read

100% responded

Last responded 1 hour ago

To apply for this and other jobs on Djinni login or signup.

Only from 1 year of experience
$500-1000
Full Remote
Worldwide
Countries where we consider candidates
Required knowledge of Ukrainian

Python

6 months

Employment: Fulltime
Domain: Advertising / Marketing
Startup
Test task is needed

Apply for the job

Last responded 1 hour ago

100% read

100% responded

📊 $500-1700 Average salary range of similar jobs in analytics →