Data Miner (Scraping) to $1000
We are looking for a motivated Junior Python developer who wants to get hands-on experience in large-scale data collection and transformation β the perfect stepping stone toward becoming a Data Engineer.
You will be part of a small scraping team that builds and maintains pipelines that extract millions of rows of public data daily, clean/normalize it, and export it as structured CSV/Parquet files for downstream analytics and ML teams.
This is a pure Python role β no Selenium, no Playwright, no Puppeteer, no headless browsers at all. We only use requests, httpx, aiohttp, proxies, and clever HTML parsing (BeautifulSoup, parsel, lxml).
Key Responsibilities
- Write and maintain robust web-scraping scripts in Python using requests / httpx / aiohttp (sync + async)
- Parse HTML/XML/JSON responses with BeautifulSoup, parsel, or lxml
- Handle anti-bot measures the βlightβ way: realistic headers, human behavior sumulation
- Normalize and clean scraped data (deduplication, schema enforcement, type casting, handling missing/inconsistent fields)
- Export clean datasets to CSV or Parquet with proper headers and encoding
- Add logging, monitoring (healthchecks, alerts on failures), and basic tests
- Refactor existing scrapers for reliability and speed
Document each scraper (target site, fields extracted, known edge cases, retry strategy)
Required Skills & Experience
- Strong Python (you write clean, readable code and understand async/await)
- Solid understanding of HTTP (status codes, headers, cookies, sessions, redirects)
- Experience with requests or httpx
- Comfortable parsing HTML with BeautifulSoup or parsel
- Basic knowledge of pandas for data cleaning/normalization
- Experience saving data to CSV/Parquet
Git and Linux command line basics
Big Advantages
- You already have personal scraping projects on GitHub (e.g., scraping e-commerce, real estate, job boards, etc.)
- Experience with rotating proxies (Luminati/Bright Data, Smartproxy, custom proxy pools)
- Basic async Python (aiohttp + asyncio)
- Understanding of CSV edge cases (escaping, quoting, UTF-8 BOM, etc.)
- Desire to eventually move into Data Engineering (Airflow, Spark, DBT, etc.) β we support that path internally
Why This Job is Perfect if You Want to Become a Data Engineer
- You will touch the entire data lifecycle from raw source β clean tables every day
- After 6β12 months you will naturally start building Airflow DAGs, moving data to PostgreSQL/Snowflake, writing schema migrations β direct path to mid/senior Data Engineer role
- Small team = you see the impact of your code immediately
- Lots of mentoring from senior data engineers
Our rounds:
Success test -> Interview with Hiring Manager -> Meet with Team Lead -> Offer.
Required skills experience
| Python | 6 months |
Required languages
| Ukrainian | Native |