Data Miner (Scraping) to $1000

We are looking for a motivated Junior Python developer who wants to get hands-on experience in large-scale data collection and transformation – the perfect stepping stone toward becoming a Data Engineer.

 

You will be part of a small scraping team that builds and maintains pipelines that extract millions of rows of public data daily, clean/normalize it, and export it as structured CSV/Parquet files for downstream analytics and ML teams.

This is a pure Python role – no Selenium, no Playwright, no Puppeteer, no headless browsers at all. We only use requests, httpx, aiohttp, proxies, and clever HTML parsing (BeautifulSoup, parsel, lxml).

 

Key Responsibilities

  • Write and maintain robust web-scraping scripts in Python using requests / httpx / aiohttp (sync + async)
  • Parse HTML/XML/JSON responses with BeautifulSoup, parsel, or lxml
  • Handle anti-bot measures the β€œlight” way: realistic headers, human behavior sumulation
  • Normalize and clean scraped data (deduplication, schema enforcement, type casting, handling missing/inconsistent fields)
  • Export clean datasets to CSV or Parquet with proper headers and encoding
  • Add logging, monitoring (healthchecks, alerts on failures), and basic tests
  • Refactor existing scrapers for reliability and speed
  • Document each scraper (target site, fields extracted, known edge cases, retry strategy)

     

Required Skills & Experience

  • Strong Python (you write clean, readable code and understand async/await)
  • Solid understanding of HTTP (status codes, headers, cookies, sessions, redirects)
  • Experience with requests or httpx
  • Comfortable parsing HTML with BeautifulSoup or parsel
  • Basic knowledge of pandas for data cleaning/normalization
  • Experience saving data to CSV/Parquet
  • Git and Linux command line basics

     

Big Advantages

  • You already have personal scraping projects on GitHub (e.g., scraping e-commerce, real estate, job boards, etc.)
  • Experience with rotating proxies (Luminati/Bright Data, Smartproxy, custom proxy pools)
  • Basic async Python (aiohttp + asyncio)
  • Understanding of CSV edge cases (escaping, quoting, UTF-8 BOM, etc.)
  • Desire to eventually move into Data Engineering (Airflow, Spark, DBT, etc.) – we support that path internally

 

Why This Job is Perfect if You Want to Become a Data Engineer

  • You will touch the entire data lifecycle from raw source β†’ clean tables every day
  • After 6–12 months you will naturally start building Airflow DAGs, moving data to PostgreSQL/Snowflake, writing schema migrations – direct path to mid/senior Data Engineer role
  • Small team = you see the impact of your code immediately
  • Lots of mentoring from senior data engineers

 

Our rounds:
Success test -> Interview with Hiring Manager -> Meet with Team Lead -> Offer.

Required skills experience

Python 6 months

Required languages

Ukrainian Native
Published 19 November
158 views
Β·
46 applications
100% read
Β·
100% responded
Last responded 1 hour ago
To apply for this and other jobs on Djinni login or signup.
Loading...