Senior Software/Data Engineer (Python + Distributed systems)
You’ll be working on event-driven, distributed, and scalable systems in Python, building large-scale data and service pipelines on AWS that support project’s content enrichment and metadata systems. This is a long-term remote opportunity with a supportive international team and strong ownership from day one.
This is a long-term, fully remote opportunity with a supportive international team and plenty of room for ownership.
Client is an American ebook and audiobook subscription service that includes more than one million titles. They began as a site to host and share documents and has grown into a global digital library with over 1.5 million subscribers in nearly every country worldwide.
The team focuses on building scalable metadata pipelines that utilize traditional tools, in-house ML models, and LLMs. In this role, you’ll design and optimize large-scale data and service pipelines running on AWS, supporting project’s content enrichment and metadata systems. You’ll work closely with cross-functional teams to design reliable backend services that integrate machine learning models and LLM-based components when needed. This role offers the opportunity to work on cutting-edge generative AI and metadata enrichment problems at a truly global scale.
Platform Tech Stack:
Python 3.10.x, PySpark, Airflow, pytest, Pandas, AWS, Delta Lake, MySQL (Aurora), Databricks, Terraform, Datadog, AWS Lambda, ECS, SQS, SNS, ElastiCache, CloudWatch.
Responsibilities:
● Provide technical leadership, mentorship, and guidance to engineers across the organization, driving secure coding best practices
● Lead the design, implementation, and scaling of event-driven, distributed systems to extract, enrich, and process metadata from large-scale document and media datasets
● Partner with Data Science, Infrastructure, ML Engineering, and Product teams to architect and deliver robust systems that balance scalability, high performance, and rapid iteration
● Contribute to the team’s engineering strategy, identifying gaps, proposing new initiatives, and improving existing frameworks
● Build and maintain scalable APIs and backend services for high-throughput content processing
● Leverage AWS services such as ECS, Lambda, SQS, ElastiCache, and CloudWatch to design and deploy resilient, high-performance systems
● Optimize and refactor existing backend systems for scalability, reliability, and performance
● Ensure system health and data integrity through monitoring, observability, and automated testing
Required skills:
- 7+ years of experience as a professional software engineer
- Strong proficiency in Python
- Proven ability to lead the design and delivery of complex software systems with minimal supervision
- Expertise in designing and architecting large-scale event-driven and distributed systems
- Strong cloud expertise with AWS services such as ECS, Lambda, SQS, SNS, and CloudWatch
- Familiarity with data processing frameworks such as Spark and Databricks
- Experience with infrastructure-as-code tools like Terraform
- Solid understanding of system performance, profiling, and optimization
- Experience leading technical projects and mentoring engineers
- Experience building and maintaining data-intensive backend systems or pipelines
- Upper-Intermediate English or higher
Nice to Have:
● Experience with Scala
● Familiarity with Ruby
● Bachelor’s degree in Computer Science or equivalent professional experience
● Experience with workflow orchestration tools such as Airflow
● Experience integrating ML or LLM-based models into production systems
Working Conditions:
— Work schedule: EST working hours !!!!
— Fully Remote
— Engagement: Long-term
— Full-time workload
— 2-week sprints with kickoff on Monday evenings and demos on Friday evenings
Required languages
| English | B2 - Upper Intermediate |
| Ukrainian | Native |