Senior Data Engineer – (PySpark / Data Infrastructure)
Senior Data Engineer – (PySpark / Data Infrastructure)
We're hiring a Senior Data Engineer to help lead the next phase of our data platform’s growth.
At Forecasa, we provide enriched real estate transaction data and analytics to private lenders and investors. Our platform processes large volumes of public data, standardizes and enriches it, and delivers actionable insights that drive lending decisions.
We recently completed a migration from a legacy SQL-based ETL stack (PostgreSQL/dbt) to PySpark, and we're now looking for a senior engineer to take ownership of the new pipeline, maintain and optimize it, and develop new data-driven features to support our customers and internal analytics.
What You’ll Do
- Own and maintain our PySpark-based data pipeline, ensuring stability, performance, and scalability.
- Design and build new data ingestion, transformation, and validation workflows.
- Optimize and monitor data jobs using Airflow, Kubernetes, and S3.
- Collaborate with data analysts, product owners, and leadership to define data needs and deliver clean, high-quality data.
- Support and mentor junior engineers working on scrapers, validation tools, and quality monitoring dashboards.
- Contribute to the evolution of our data infrastructure and architectural decisions.
Our Tech Stack
Python • PySpark • PostgreSQL • dbt • Airflow • S3 • Kubernetes • GitLab • Grafana
What We’re Looking For
- 5+ years of experience in data engineering or backend systems with large-scale data processing.
- Strong experience with PySpark, including building scalable data pipelines and working with large datasets.
- Solid command of SQL, data modeling, and performance tuning (especially in PostgreSQL).
- Experience working with orchestration tools like Airflow, and containers via Docker/Kubernetes.
- Familiarity with cloud storage (preferably S3) and modern CI/CD workflows.
- Ability to work independently and communicate clearly in a remote, async-first environment.
Bonus Points
- Background in real estate or financial data
- Experience with data quality frameworks or observability tools (e.g., Great Expectations, Grafana, Prometheus)
- Experience optimizing PySpark jobs for performance and cost-efficiency