-
· 112 views · 30 applications · 25d
Senior Data Engineer – (PySpark / Data Infrastructure)
Full Remote · Worldwide · Product · 5 years of experience · English - NoneSenior Data Engineer – (PySpark / Data Infrastructure) We're hiring a Senior Data Engineer to help lead the next phase of our data platform’s growth. At Forecasa, we provide enriched real estate transaction data and analytics to private lenders and...Senior Data Engineer – (PySpark / Data Infrastructure)
We're hiring a Senior Data Engineer to help lead the next phase of our data platform’s growth.
At Forecasa, we provide enriched real estate transaction data and analytics to private lenders and investors. Our platform processes large volumes of public data, standardizes and enriches it, and delivers actionable insights that drive lending decisions.
We recently completed a migration from a legacy SQL-based ETL stack (PostgreSQL/dbt) to PySpark, and we're now looking for a senior engineer to take ownership of the new pipeline, maintain and optimize it, and develop new data-driven features to support our customers and internal analytics.
What You’ll Do
- Own and maintain our PySpark-based data pipeline, ensuring stability, performance, and scalability.
- Design and build new data ingestion, transformation, and validation workflows.
- Optimize and monitor data jobs using Airflow, Kubernetes, and S3.
- Collaborate with data analysts, product owners, and leadership to define data needs and deliver clean, high-quality data.
- Support and mentor junior engineers working on scrapers, validation tools, and quality monitoring dashboards.
- Contribute to the evolution of our data infrastructure and architectural decisions.
Our Tech Stack
Python • PySpark • PostgreSQL • dbt • Airflow • S3 • Kubernetes • GitLab • Grafana
What We’re Looking For
- 5+ years of experience in data engineering or backend systems with large-scale data processing.
- Strong experience with PySpark, including building scalable data pipelines and working with large datasets.
- Solid command of SQL, data modeling, and performance tuning (especially in PostgreSQL).
- Experience working with orchestration tools like Airflow, and containers via Docker/Kubernetes.
- Familiarity with cloud storage (preferably S3) and modern CI/CD workflows.
- Ability to work independently and communicate clearly in a remote, async-first environment.
Bonus Points
- Background in real estate or financial data
- Experience with data quality frameworks or observability tools (e.g., Great Expectations, Grafana, Prometheus)
- Experience optimizing PySpark jobs for performance and cost-efficiency
-
· 72 views · 17 applications · 25d
Data Scientist / Quantitative Risk Analyst
Full Remote · Worldwide · 4 years of experience · English - NoneAbout Forecasa Forecasa is a profitable, founder‑led SaaS company that turns raw real‑estate transaction data into decision‑grade intelligence for hedge funds, private‑lenders, and MBS desks. We move fast, value autonomy with accountability, and maintain...About Forecasa
Forecasa is a profitable, founder‑led SaaS company that turns raw real‑estate transaction data into decision‑grade intelligence for hedge funds, private‑lenders, and MBS desks. We move fast, value autonomy with accountability, and maintain a culture where clear documentation beats hierarchy.
What you’ll do
- Engineer risk‑focused features (borrower, lender, property, geography) in Python/PySpark.
- Develop and validate PD / LGD models using WoE, IV, logistic GBM, XGBoost, or similar.
- Prototype lender‑health metrics (capital‑diversification, portfolio turnover, market concentration, etc.) for client dashboards.
- Create robust, reproducible data pipelines (git‑versioned, unit‑tested, CI in GitLab).
Produce concise notebooks & dashboards that can feed automated PDF reports.
Must‑have qualifications
- 4 – 6+ years in data science, risk analytics, or credit‑modeling.
- Strong Python (pandas, NumPy, scikit‑learn) and SQL; solid PySpark on distributed data a big plus.
- Hands‑on experience building or validating credit‑risk or fraud models (PD, scorecards, Basel/IFRS 9, etc.).
- Fluency in statistics (inferential tests, multicollinearity, model monitoring).
- Git workflow, code review discipline, and comfort with Agile/Kanban boards.
- Clear written & spoken English; able to summarize findings for non‑technical stakeholders.
Nice‑to‑haves
- Familiarity with U.S. mortgage or private‑lending data.
- Experience with Postgres, MinIO/S3, or dbt.
- Knowledge of BI/visualization tools (Plotly, PowerBI, Looker, etc).
- Prior work in a fully remote, internationally‑distributed team.
How we work
- Stack: Python • PySpark • PostgreSQL/Snowflake • GitLab CI • AWS & on‑prem Spark
- Communication: Slack, Zoom, Notion. Meetings kept lean; deliverables drive the schedule.
- Culture: Low‑ego, high‑ownership. We favor clarity, rapid feedback loops, and well‑documented processes.
-
· 210 views · 62 applications · 16d
Data Engineer – Web Scraping/Data Quality (AI-Augmented)
Full Remote · Worldwide · Product · 1 year of experience · English - C1We're looking for a sharp, curious, and driven Data Engineer to join Forecasa, a U.S.-based data startup delivering high-quality real estate data and analytics to lenders and investors. You'll be part of our Data Acquisition & Quality team, helping scale...We're looking for a sharp, curious, and driven Data Engineer to join Forecasa, a U.S.-based data startup delivering high-quality real estate data and analytics to lenders and investors.
You'll be part of our Data Acquisition & Quality team, helping scale and improve the systems that collect, validate, and monitor the data powering our platform. We've built serious AI-augmented development workflows internally - Claude Code, autonomous agents, GitLab-based orchestration - and we're looking for engineers who've already formed their own opinions about how to work effectively with these tools.
What You'll Do
- Develop and maintain Python-based web scrapers, using AI coding assistants to accelerate development while applying your judgment on reliability and edge cases
- Use tools like Selenium, BeautifulSoup, Pandas, and PySpark to extract and normalize data efficiently
- Package scrapers as Docker containers and deploy them to Kubernetes
- Create and manage Airflow DAGs to orchestrate scraping pipelines
- Build data validation pipelines to catch anomalies, missing values, and inconsistencies
- Review and refine AI-generated code for production reliability
- Set up Grafana dashboards to monitor pipeline health and data quality metrics
Our Tech Stack
Python · PySpark · Selenium · Airflow · Pandas · Postgres · S3 · Docker · Kubernetes · GitLab · Grafana · Claude Code
What We're Looking For
- Solid Python experience, especially building web scrapers
- Familiarity with Selenium, BeautifulSoup, or Scrapy
- Some experience with Docker, Airflow, or other orchestration tools
- Active use of AI coding tools (Claude Code, Cursor, Copilot, etc.) with opinions about what works and what doesn't - we want to hear about your preferred workflows
- Strong code review instincts - you can spot issues in code whether you wrote it or an AI did
- A resourceful, problem-solving mindset - not afraid to dig into a messy site or debug a flaky scraper
Bonus Points For
- Experience with Grafana or Prometheus for monitoring
- Exposure to cloud platforms (AWS preferred) and managing scrapers at scale
- Familiarity with CI/CD and Git workflows (we use GitLab)
About Us
Forecasa delivers enriched real estate transaction data to private lenders and institutional investors. We're a small, fast-moving team with a strong engineering culture. We've invested heavily in AI-augmented development - autonomous coding agents, GitLab orchestration - and we're looking for people who are already bought in on this direction, not people we need to convince.
Location
Remote – we welcome candidates from anywhere in the world.
To Apply
Tell us about your current AI coding workflow - what tools you use, what you've learned, what you'd do differently. Generic applications without this will be deprioritized.
More
Forecasa is a data centric startup based in the United States. We focus on providing data and analytics for lenders in the real estate space.
Website:
https://www.forecasa.com