Pytohon Data Engineer (Dataricks, AWS, LLM)
This position is very urgent! I would be happy to have a call with you today and give all the details.
Location: Remote
Employment Type: Full-time
Contract Duration: 6 months (with possibility of extension)
About Us
We are a SaaS company that collects and analyzes large-scale web data to generate actionable consumer insights for global brands. Our platform powers dashboards and data deliveries built on a classified catalog of products, reviews, and social content (posts, videos, and more).
We work with high-volume, complex datasets and modern data technologies. Collaboration, continuous learning, and a strong problem-solving mindset are core to our culture.
Role Overview
We are looking for a Senior Software Data Engineer to design, build, and optimize scalable, production-grade data pipelines using AWS and Databricks.
As a key member of the Data Team, you will be responsible for ensuring data reliability, integrity, and performance. You will also contribute to ML, MLflow, and LLM-based workflows, delivering high-quality, client-ready data solutions on time.
You will collaborate closely with R&D, Product, and Delivery teams to validate features, resolve issues, and ensure smooth, reliable delivery of insights to clients.
Responsibilities
- Design, build, and maintain scalable data pipelines using PySpark, Python, and AWS services
- Deliver production-ready data outputs with high accuracy, reliability, and timeliness
- Optimize data processing performance and troubleshoot complex pipeline issues
- Develop and enhance automated testing and data quality frameworks
- Integrate ML, MLflow, and LLM-based workflows into existing pipelines
- Collaborate with Product Managers and Delivery Analysts to define release readiness and client-facing quality standards
Promote best practices in data engineering, QA, documentation, and maintainable code
Requirements
- 5+ years of experience as a Data Engineer in production environments
- Strong expertise in PySpark, Python, and SQL
- Hands-on experience with AWS (data services and cloud infrastructure)
- Experience working with Databricks / DBX framework
- Solid background in automated testing and QA for data pipelines
- Proven problem-solving and debugging skills, including pipeline optimization
Excellent English communication skills and ability to work cross-functionally in distributed teams
Nice to Have / Advantages
- Experience with big data architectures and data lakes
- Familiarity with CI/CD pipelines and DevOps practices
- Experience with MLflow and exposure to LLM-based solutions
Knowledge of data governance, monitoring, and observability frameworks
Why Join Us
- Work on high-impact data challenges that directly influence client outcomes
- Be part of a collaborative team building data and AI-powered solutions
- Fully remote work environment with flexible setup
- Competitive compensation and opportunity for contract extension
Required languages
| English | B2 - Upper Intermediate |