Senior Software Data Engineer
Location: Remote
Job Type: Full-Time (6-month contract with possibility of extension)
About Us
We are a SaaS company that collects large-scale web data, analyzes it, and transforms it into actionable consumer insights for global brands.
Our offerings include:
- Data-driven dashboards for eCommerce, product development, and social platforms
- Classified catalogs of products, reviews, and social content (posts, videos, comments, etc.)
- Data drops and analytical outputs used by enterprise clients
We work with massive datasets and cutting-edge technologies, and we value collaboration, problem-solving, and continuous learning.
Role Overview
We are looking for a highly skilled Senior Software Data Engineer to design, build, and optimize scalable data pipelines using AWS and the Databricks (DBX) ecosystem.
You will play a key role in ensuring the accuracy, reliability, and timeliness of our data outputs while contributing to our ML, MLflow, and LLM-driven capabilities.
You will collaborate closely with cross-functional teams including R&D, Product, and Delivery to validate features, troubleshoot issues, and deliver high-quality insights to clients.
Key Responsibilities
- Design, build, and optimize scalable data pipelines using PySpark and AWS services
- Deliver production-grade data outputs with high accuracy and reliability
- Develop automated testing frameworks to support end-to-end data quality
- Integrate ML, MLflow, and LLM-based workflows into data pipelines
- Troubleshoot and resolve complex data and pipeline-related issues
- Collaborate with Product Managers and Delivery Analysts to ensure release readiness
- Maintain clear documentation and promote best practices across data engineering
- Contribute to continuous improvement of our data infrastructure and workflows
Requirements
- 5+ years of professional experience as a Data Engineer
- Strong expertise in PySpark, Python, and SQL
- Experience with AWS data ecosystem
- Practical background in automated testing and QA for data pipelines
- Strong debugging and performance optimization skills
- Experience working with Databricks (DBX)
- Excellent communication skills in English and ability to collaborate across teams
Nice to Have
- Experience working with big data and data lake architectures
- Familiarity with CI/CD and DevOps practices
- Experience with MLflow or LLM-driven pipelines
- Knowledge of data governance and monitoring frameworks
Required languages
| English | B2 - Upper Intermediate |
| Ukrainian | B2 - Upper Intermediate |