Machine Learning Engineer (Real-Time Inference Systems)

We are looking for a highly skilled and independent Machine Learning Engineer to lead the design and development of next-generation real-time inference services. This role involves building the core engine that powers large-scale algorithmic decision-making, serving billions of daily requests with extremely tight latency and performance requirements.

You will work at the intersection of machine learning, large-scale backend engineering, and business logic, creating robust, high-availability services capable of supporting massive traffic and dynamic real-time decisions. This is an opportunity to own mission-critical systems in a high-performance environment.

Responsibilities

  • Lead the design and development of low-latency algorithmic inference services handling billions of requests per day.
  • Build and scale real-time decision-making engines, combining ML models with business logic under strict SLAs.
  • Work closely with data science teams to deploy models seamlessly into production.
  • Develop systems for model versioning, shadow deployments, and A/B testing in real time.
  • Ensure high availability, scalability, and observability across production services.
  • Continuously optimize latency, throughput, and cost-efficiency using modern tooling and performance techniques.
  • Collaborate cross-functionally with teams across Algo, Infrastructure, Product, Engineering, and Business.
  • Work independently and take ownership of solutions end-to-end.

Requirements

  • B.Sc. or M.Sc. in Computer Science, Software Engineering, or a related field.
  • 5+ years building high-performance backend or ML inference systems.
  • Strong expertise in Python and experience with low-latency serving frameworks (FastAPI, Triton, TorchServe, BentoML).
  • Experience with scalable architectures, message queues (Kafka, Pub/Sub), and asynchronous processing.
  • Deep understanding of ML model deployment, feature parity, and real-time monitoring.
  • Strong experience with cloud environments (AWS, GCP, OCI) and Kubernetes.
  • Hands-on experience with in-memory & NoSQL databases such as Redis, Aerospike, or Bigtable.
  • Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry), alerting, and system diagnostics.
  • Strong ownership mindset, ability to lead solutions independently.
  • Passion for performance, clean architecture, and building impactful production systems.

Required languages

English C2 - Proficient
Published 17 November
20 views
ยท
2 applications
100% read
To apply for this and other jobs on Djinni login or signup.
Loading...