Machine Learning Engineer (Real-Time Inference Systems)

Our client is a leading mobile marketing and audience platform empowering the global app ecosystem with advanced solutions in mobile marketing, audience building, and monetization.

With direct integrations into 500,000+ mobile apps worldwide, they process massive volumes of first-party data to deliver intelligent, real-time, and scalable advertising decisions. Their platform operates at extreme scale, serving billions of requests per day under strict latency and performance constraints.

About the Role

We are looking for a highly skilled, independent, and driven Machine Learning Engineer to own and lead the design and development of our next-generation real-time inference services.

This is a rare opportunity to take ownership of mission-critical systems on a massive scale, working at the intersection of machine learning, large-scale backend engineering, and business logic.

You will build robust, low-latency services that seamlessly combine predictive models with dynamic decision logic โ€” while meeting extreme requirements for performance, reliability, and scalability.

Responsibilities

  • Own and lead the design and development of low-latency inference services handling billions of requests per day
  • Build and scale real-time decision-making engines, integrating ML models with business logic under strict SLAs
  • Collaborate closely with Data Science teams to deploy models reliably into production
  • Design and operate systems for model versioning, shadowing, and A/B testing in runtime
  • Ensure high availability, scalability, and observability of production services
  • Continuously optimize latency, throughput, and cost efficiency
  • Work independently while collaborating with stakeholders across Algo, Infra, Product, Engineering, Business Analytics, and Business teams

Requirements

  • B.Sc. or M.Sc. in Computer Science, Software Engineering, or a related technical field
  • 5+ years of experience building high-performance backend or ML inference systems
  • Strong expertise in Python
  • Hands-on experience with low-latency APIs and real-time serving frameworks
    (FastAPI, Triton Inference Server, TorchServe, BentoML)
  • Experience designing scalable service architectures
  • Strong knowledge of async processing, message queues, and streaming systems
    (Kafka, Pub/Sub, SQS, RabbitMQ, Kinesis)
  • Solid understanding of model deployment, online/offline feature parity, and real-time monitoring
  • Experience with cloud platforms (AWS, GCP, or OCI)
  • Strong hands-on experience with Kubernetes
  • Experience with in-memory / NoSQL databases
    (Aerospike, Redis, Bigtable)
  • Familiarity with observability stacks: Prometheus, Grafana, OpenTelemetry
  • Strong sense of ownership and ability to drive solutions end-to-end
  • Passion for performance, clean architecture, and impactful systems

Required languages

English C2 - Proficient
Published 29 January
33 views
ยท
3 applications
To apply for this and other jobs on Djinni login or signup.
Loading...