Senior/Lead Python Engineer (ML Infrastructure)

Our Customer:
Our client is a technology-focused company building high-performance, real-time ML inference systems. The team develops ultra-low-latency engines that process billions of requests per day, integrating ML models with business-critical decision-making pipelines. They are looking for an experienced backend engineer to own and scale production-grade ML services with strong focus on latency, reliability, and observability.

Your tasks:

  • Lead the design and development of low-latency ML inference services handling massive request volumes.
  • Build and scale real-time decision-making engines, integrating ML models with business logic under strict SLAs.
  • Collaborate closely with data scientists to deploy ML models seamlessly and reliably in production.
  • Design systems for model versioning, shadowing, and A/B testing at runtime.
  • Ensure high availability, scalability, and observability of production systems.
  • Continuously optimize latency, throughput, and cost-efficiency using modern tools and techniques.
  • Work independently while collaborating with cross-functional teams including Algo, Infrastructure, Product, Engineering, and Business stakeholders.

 

Required Experience and Skills:

  • B.Sc. or M.Sc. in Computer Science, Software Engineering, or related technical field.
  • 5+ years of experience building high-performance backend or ML inference systems.
  • Expert in Python and experience with low-latency APIs and real-time serving frameworks (e.g., FastAPI, Triton Inference Server, TorchServe, BentoML).
  • Experience with scalable service architectures, message queues (Kafka, Pub/Sub), and asynchronous processing.
  • Strong understanding of model deployment, online/offline feature parity, and real-time monitoring.
  • Experience with cloud environments (AWS, GCP, OCI) and container orchestration (Kubernetes).
  • Familiarity with in-memory and NoSQL databases (Aerospike, Redis, Bigtable) for ultra-fast data access.
  • Experience with observability stacks (Prometheus, Grafana, OpenTelemetry) and alerting/diagnostics best practices.
  • Strong ownership mindset and ability to deliver solutions end-to-end.
  • Passion for performance, clean architecture, and impactful systems.

 

Would be a plus:

  • Prior experience leading high-throughput, low-latency ML systems in production.
  • Knowledge of real-time feature pipelines and streaming data platforms.
  • Familiarity with advanced monitoring and profiling techniques for ML services.

 

Working Conditions:

  • Remote work;
  • 5-day working week, 8-hour working day, flexible schedule.

Required skills experience

Python

Required languages

English C1 - Advanced
Published 3 December
78 views
ยท
3 applications
34% read
ยท
34% responded
Last responded 2 days ago
To apply for this and other jobs on Djinni login or signup.
Loading...