Data Engineer – Web Scraping/Data Quality (AI-Augmented)
We're looking for a sharp, curious, and driven Data Engineer to join Forecasa, a U.S.-based data startup delivering high-quality real estate data and analytics to lenders and investors.
You'll be part of our Data Acquisition & Quality team, helping scale and improve the systems that collect, validate, and monitor the data powering our platform. We've built serious AI-augmented development workflows internally - Claude Code, autonomous agents, GitLab-based orchestration - and we're looking for engineers who've already formed their own opinions about how to work effectively with these tools.
What You'll Do
- Develop and maintain Python-based web scrapers, using AI coding assistants to accelerate development while applying your judgment on reliability and edge cases
- Use tools like Selenium, BeautifulSoup, Pandas, and PySpark to extract and normalize data efficiently
- Package scrapers as Docker containers and deploy them to Kubernetes
- Create and manage Airflow DAGs to orchestrate scraping pipelines
- Build data validation pipelines to catch anomalies, missing values, and inconsistencies
- Review and refine AI-generated code for production reliability
- Set up Grafana dashboards to monitor pipeline health and data quality metrics
Our Tech Stack
Python · PySpark · Selenium · Airflow · Pandas · Postgres · S3 · Docker · Kubernetes · GitLab · Grafana · Claude Code
What We're Looking For
- Solid Python experience, especially building web scrapers
- Familiarity with Selenium, BeautifulSoup, or Scrapy
- Some experience with Docker, Airflow, or other orchestration tools
- Active use of AI coding tools (Claude Code, Cursor, Copilot, etc.) with opinions about what works and what doesn't - we want to hear about your preferred workflows
- Strong code review instincts - you can spot issues in code whether you wrote it or an AI did
- A resourceful, problem-solving mindset - not afraid to dig into a messy site or debug a flaky scraper
Bonus Points For
- Experience with Grafana or Prometheus for monitoring
- Exposure to cloud platforms (AWS preferred) and managing scrapers at scale
- Familiarity with CI/CD and Git workflows (we use GitLab)
About Us
Forecasa delivers enriched real estate transaction data to private lenders and institutional investors. We're a small, fast-moving team with a strong engineering culture. We've invested heavily in AI-augmented development - autonomous coding agents, GitLab orchestration - and we're looking for people who are already bought in on this direction, not people we need to convince.
Location
Remote – we welcome candidates from anywhere in the world.
To Apply
Tell us about your current AI coding workflow - what tools you use, what you've learned, what you'd do differently. Generic applications without this will be deprioritized.
Required skills experience
| Selenium | 1 year |
| Python | 1 year |
Required languages
| English | C1 - Advanced |