Data Engineer – Web Scraping/Data Quality (AI-Augmented)

We're looking for a sharp, curious, and driven Data Engineer to join Forecasa, a U.S.-based data startup delivering high-quality real estate data and analytics to lenders and investors.

You'll be part of our Data Acquisition & Quality team, helping scale and improve the systems that collect, validate, and monitor the data powering our platform. We've built serious AI-augmented development workflows internally - Claude Code, autonomous agents, GitLab-based orchestration - and we're looking for engineers who've already formed their own opinions about how to work effectively with these tools.

What You'll Do

  • Develop and maintain Python-based web scrapers, using AI coding assistants to accelerate development while applying your judgment on reliability and edge cases
  • Use tools like Selenium, BeautifulSoup, Pandas, and PySpark to extract and normalize data efficiently
  • Package scrapers as Docker containers and deploy them to Kubernetes
  • Create and manage Airflow DAGs to orchestrate scraping pipelines
  • Build data validation pipelines to catch anomalies, missing values, and inconsistencies
  • Review and refine AI-generated code for production reliability
  • Set up Grafana dashboards to monitor pipeline health and data quality metrics

Our Tech Stack

Python · PySpark · Selenium · Airflow · Pandas · Postgres · S3 · Docker · Kubernetes · GitLab · Grafana · Claude Code

What We're Looking For

  • Solid Python experience, especially building web scrapers
  • Familiarity with Selenium, BeautifulSoup, or Scrapy
  • Some experience with Docker, Airflow, or other orchestration tools
  • Active use of AI coding tools (Claude Code, Cursor, Copilot, etc.) with opinions about what works and what doesn't - we want to hear about your preferred workflows
  • Strong code review instincts - you can spot issues in code whether you wrote it or an AI did
  • A resourceful, problem-solving mindset - not afraid to dig into a messy site or debug a flaky scraper

Bonus Points For

  • Experience with Grafana or Prometheus for monitoring
  • Exposure to cloud platforms (AWS preferred) and managing scrapers at scale
  • Familiarity with CI/CD and Git workflows (we use GitLab)

About Us

Forecasa delivers enriched real estate transaction data to private lenders and institutional investors. We're a small, fast-moving team with a strong engineering culture. We've invested heavily in AI-augmented development - autonomous coding agents, GitLab orchestration - and we're looking for people who are already bought in on this direction, not people we need to convince.

Location

Remote – we welcome candidates from anywhere in the world.

To Apply

Tell us about your current AI coding workflow - what tools you use, what you've learned, what you'd do differently. Generic applications without this will be deprioritized.

Required skills experience

Selenium 1 year
Python 1 year

Required languages

English C1 - Advanced
Selenium, Python
Published 11 January
84 views
·
17 applications
50% read
To apply for this and other jobs on Djinni login or signup.
Loading...