Forecasa

Joined in 2021
Forecasa is a data centric startup based in the United States. We focus on providing data and analytics for lenders in the real estate space.
  • · 30 views · 10 applications · 2d

    Senior Data Engineer – (PySpark / Data Infrastructure)

    Full Remote · Worldwide · Product · 5 years of experience · Advanced/Fluent
    Senior Data Engineer – (PySpark / Data Infrastructure) We're hiring a Senior Data Engineer to help lead the next phase of our data platform’s growth. At Forecasa, we provide enriched real estate transaction data and analytics to private lenders and...

    Senior Data Engineer –  (PySpark / Data Infrastructure)

    We're hiring a Senior Data Engineer to help lead the next phase of our data platform’s growth.

    At Forecasa, we provide enriched real estate transaction data and analytics to private lenders and investors. Our platform processes large volumes of public data, standardizes and enriches it, and delivers actionable insights that drive lending decisions.

    We recently completed a migration from a legacy SQL-based ETL stack (PostgreSQL/dbt) to PySpark, and we're now looking for a senior engineer to take ownership of the new pipeline, maintain and optimize it, and develop new data-driven features to support our customers and internal analytics.

    What You’ll Do

    • Own and maintain our PySpark-based data pipeline, ensuring stability, performance, and scalability.
    • Design and build new data ingestion, transformation, and validation workflows.
    • Optimize and monitor data jobs using Airflow, Kubernetes, and S3.
    • Collaborate with data analysts, product owners, and leadership to define data needs and deliver clean, high-quality data.
    • Support and mentor junior engineers working on scrapers, validation tools, and quality monitoring dashboards.
    • Contribute to the evolution of our data infrastructure and architectural decisions.

    Our Tech Stack

    Python • PySpark • PostgreSQL • dbt • Airflow • S3 • Kubernetes • GitLab • Grafana

    What We’re Looking For

    • 5+ years of experience in data engineering or backend systems with large-scale data processing.
    • Strong experience with PySpark, including building scalable data pipelines and working with large datasets.
    • Solid command of SQL, data modeling, and performance tuning (especially in PostgreSQL).
    • Experience working with orchestration tools like Airflow, and containers via Docker/Kubernetes.
    • Familiarity with cloud storage (preferably S3) and modern CI/CD workflows.
    • Ability to work independently and communicate clearly in a remote, async-first environment.

    Bonus Points

    • Background in real estate or financial data
    • Experience with data quality frameworks or observability tools (e.g., Great Expectations, Grafana, Prometheus)
    • Experience optimizing PySpark jobs for performance and cost-efficiency
    More
  • · 151 views · 50 applications · 2d

    Junior Data Engineer – (Python/Web Scraping/Data Quality)

    Full Remote · Worldwide · Product · 1 year of experience · Advanced/Fluent
    Junior Data Engineer – (Python/Web Scraping/Data Quality) We’re looking for a sharp, curious, and driven Junior Data Engineer to join our team at Forecasa, a U.S.-based data startup focused on delivering high-quality real estate data and analytics to...

    Junior Data Engineer – (Python/Web Scraping/Data Quality)
     

    We’re looking for a sharp, curious, and driven Junior Data Engineer to join our team at Forecasa, a U.S.-based data startup focused on delivering high-quality real estate data and analytics to lenders and investors.

    In this role, you’ll be part of our Data Acquisition & Quality team, helping us scale and improve the systems that collect, validate, and monitor the data that powers our platform.

    What You’ll Do

    • Develop and maintain Python-based web scrapers to collect structured and unstructured data from various sources.
    • Use tools like Selenium, BeautifulSoup, and Pandas and Pyspark to extract and normalize data efficiently.
    • Package scrapers as Docker containers and deploy them to Kubernetes.
    • Create and manage Airflow DAGs to orchestrate and schedule scraping pipelines.
    • Build data validation pipelines to catch anomalies, missing values, and data inconsistencies.
    • Set up Grafana dashboards to monitor pipeline health and data quality metrics.
    • Collaborate with senior engineers to continuously improve scraper reliability, performance, and coverage.

    Our Tech Stack

    Python • PySpark • Selenium • Airflow • Pandas • Postgres • S3 • Docker • Kubernetes • GitLab • Grafana

    What We're Looking For

    • Solid experience in Python, especially in building web scrapers.
    • Familiarity with libraries like Selenium, BeautifulSoup, or Scrapy.
    • Some experience with Docker, Airflow, or other workflow orchestration tools.
    • Basic understanding of data validation, data cleaning, and monitoring best practices.
    • A resourceful, problem-solving mindset — you’re not afraid to dig into a messy site or debug a flaky scraper.

    Bonus Points For

    • Experience working with Grafana or Prometheus for monitoring.
    • Exposure to cloud platforms (AWS preferred) and managing scrapers at scale.
    • Familiarity with CI/CD and Git workflows (we use GitLab).

    About Us

    Forecasa is a U.S.-based startup delivering enriched real estate transaction data to private lenders and investors. We’re a small, fast-moving team with a strong engineering culture and a mission to bring clarity and transparency to a fragmented market.

    Location

    Remote – we welcome candidates from anywhere in the world.

     

    NOTE: Please make all e-mails and communications through the djinni website. Thank you.

    More
Log In or Sign Up to see all posted jobs