Senior Data Engineer (Python)
Senior Data Engineer (Python)
Builds one of the world’s largest property intelligence datasets by processing satellite and street-level imagery across the entire United States.
We are looking for a Senior Python Data Engineer to design and operate large-scale data pipelines that generate property attributes from imagery using internal AI models.
This role focuses on building distributed processing systems running on Kubernetes clusters and on-prem GPU infrastructure, capable of processing massive geospatial datasets at national scale.
You will work closely with AI engineers and SaaS platform teams to build robust pipelines that transform imagery into structured property intelligence used by the insurance and financial industries.
Responsibilities
• Design and build large-scale Python processing pipelines for satellite and street-level imagery
• Develop distributed workloads running on Kubernetes clusters
• Build and maintain orchestration workflows using Airflow
• Design and manage data storage and processing workflows using PostgreSQL
• Process geospatial datasets using GeoPandas, GDAL, Shapely, or similar tools
• Integrate large-scale AI inference pipelines running on GPU infrastructure
• Optimize processing performance for massive imagery datasets
• Improve observability, monitoring, and reliability of production pipelines
• Collaborate with AI engineers to support large-scale computer vision inference workflows
• Integrate pipelines with AWS-based services
Requirements
• 7+ years of Python engineering experience
• Strong experience designing distributed data pipelines or large-scale data processing systems
• Experience running workloads on Kubernetes
• Hands-on experience with Airflow or similar orchestration frameworks
• Experience processing large datasets (imagery, geospatial, or other high-volume data)
• Strong understanding of ETL architectures and distributed data systems
Nice to Have
• Experience with geospatial processing (GeoPandas, GDAL, Rasterio, Shapely)
• Experience working with satellite imagery, aerial imagery, or remote sensing datasets
• Experience with GPU-based inference pipelines
• Experience with distributed processing frameworks such as Spark, Ray, or Dask
• Experience working with on-prem compute clusters
• Experience with C#
Why Join
You will be working on a platform that processes imagery data at national scale to generate property intelligence for the insurance and financial industries. The system analyzes hundreds of millions of properties using AI and geospatial processing, creating one of the most comprehensive property datasets in the world.
Required skills experience
| Python | 7 years |
| Distributed Data Pipelines / Big Data Systems | 4 years |
| Kubernetes | 4 years |
| Large-scale Data Processing | 4 years |
| PostgreSQL | 2 years |
| AWS | 2 years |
| Observability and monitoring | 3 years |
| AI/ML Inference Pipelines | 1 year |
| C# | 6 months |
| Data Pipelines - Big Data | 4 years |
Required languages
| English | C1 - Advanced |