Senior/Lead Python Engineer (ML Infrastructure)

Our Customer:
Our client is a technology-focused company building high-performance, real-time ML inference systems. The team develops ultra-low-latency engines that process billions of requests per day, integrating ML models with business-critical decision-making pipelines. They are looking for an experienced backend engineer to own and scale production-grade ML services with strong focus on latency, reliability, and observability.

Your tasks:

Lead the design and development of low-latency ML inference services handling massive request volumes.
Build and scale real-time decision-making engines, integrating ML models with business logic under strict SLAs.
Collaborate closely with data scientists to deploy ML models seamlessly and reliably in production.
Design systems for model versioning, shadowing, and A/B testing at runtime.
Ensure high availability, scalability, and observability of production systems.
Continuously optimize latency, throughput, and cost-efficiency using modern tools and techniques.
Work independently while collaborating with cross-functional teams including Algo, Infrastructure, Product, Engineering, and Business stakeholders.

Required Experience and Skills:

B.Sc. or M.Sc. in Computer Science, Software Engineering, or related technical field.
5+ years of experience building high-performance backend or ML inference systems.
Expert in Python and experience with low-latency APIs and real-time serving frameworks (e.g., FastAPI, Triton Inference Server, TorchServe, BentoML).
Experience with scalable service architectures, message queues (Kafka, Pub/Sub), and asynchronous processing.
Strong understanding of model deployment, online/offline feature parity, and real-time monitoring.
Experience with cloud environments (AWS, GCP, OCI) and container orchestration (Kubernetes).
Familiarity with in-memory and NoSQL databases (Aerospike, Redis, Bigtable) for ultra-fast data access.
Experience with observability stacks (Prometheus, Grafana, OpenTelemetry) and alerting/diagnostics best practices.
Strong ownership mindset and ability to deliver solutions end-to-end.
Passion for performance, clean architecture, and impactful systems.

Would be a plus:

Prior experience leading high-throughput, low-latency ML systems in production.
Knowledge of real-time feature pipelines and streaming data platforms.
Familiarity with advanced monitoring and profiling techniques for ML services.

Working Conditions:

Remote work;
5-day working week, 8-hour working day, flexible schedule.

Required skills experience

Python

Required languages

English

C1 - Advanced

Published 3 December

78 views

3 applications

34% read

34% responded

Last responded 2 days ago

To apply for this and other jobs on Djinni login or signup.

Only from 5 years of experience
Full Remote
Ukraine
Countries where we consider candidates
English C1 - Advanced

Python

Employment: Fulltime
Domain: Other
Outstaff

Apply for the job

Last responded 2 days ago

34% read

34% responded

📊 Average salary range of similar jobs in analytics →