Junior Data Engineer – (Python/Web Scraping/Data Quality)
Junior Data Engineer – (Python/Web Scraping/Data Quality)
We’re looking for a sharp, curious, and driven Junior Data Engineer to join our team at Forecasa, a U.S.-based data startup focused on delivering high-quality real estate data and analytics to lenders and investors.
In this role, you’ll be part of our Data Acquisition & Quality team, helping us scale and improve the systems that collect, validate, and monitor the data that powers our platform.
What You’ll Do
- Develop and maintain Python-based web scrapers to collect structured and unstructured data from various sources.
- Use tools like Selenium, BeautifulSoup, and Pandas and Pyspark to extract and normalize data efficiently.
- Package scrapers as Docker containers and deploy them to Kubernetes.
- Create and manage Airflow DAGs to orchestrate and schedule scraping pipelines.
- Build data validation pipelines to catch anomalies, missing values, and data inconsistencies.
- Set up Grafana dashboards to monitor pipeline health and data quality metrics.
- Collaborate with senior engineers to continuously improve scraper reliability, performance, and coverage.
Our Tech Stack
Python • PySpark • Selenium • Airflow • Pandas • Postgres • S3 • Docker • Kubernetes • GitLab • Grafana
What We're Looking For
- Solid experience in Python, especially in building web scrapers.
- Familiarity with libraries like Selenium, BeautifulSoup, or Scrapy.
- Some experience with Docker, Airflow, or other workflow orchestration tools.
- Basic understanding of data validation, data cleaning, and monitoring best practices.
- A resourceful, problem-solving mindset — you’re not afraid to dig into a messy site or debug a flaky scraper.
Bonus Points For
- Experience working with Grafana or Prometheus for monitoring.
- Exposure to cloud platforms (AWS preferred) and managing scrapers at scale.
- Familiarity with CI/CD and Git workflows (we use GitLab).
About Us
Forecasa is a U.S.-based startup delivering enriched real estate transaction data to private lenders and investors. We’re a small, fast-moving team with a strong engineering culture and a mission to bring clarity and transparency to a fragmented market.
Location
Remote – we welcome candidates from anywhere in the world.
NOTE: Please make all e-mails and communications through the djinni website. Thank you.