Machine Learning Engineer (Real-Time Inference Systems)
Our client is a leading mobile marketing and audience platform empowering the global app ecosystem with advanced solutions in mobile marketing, audience building, and monetization.
With direct integrations into 500,000+ mobile apps worldwide, they process massive volumes of first-party data to deliver intelligent, real-time, and scalable advertising decisions. Their platform operates at extreme scale, serving billions of requests per day under strict latency and performance constraints.
About the Role
We are looking for a highly skilled, independent, and driven Machine Learning Engineer to own and lead the design and development of our next-generation real-time inference services.
This is a rare opportunity to take ownership of mission-critical systems on a massive scale, working at the intersection of machine learning, large-scale backend engineering, and business logic.
You will build robust, low-latency services that seamlessly combine predictive models with dynamic decision logic โ while meeting extreme requirements for performance, reliability, and scalability.
Responsibilities
- Own and lead the design and development of low-latency inference services handling billions of requests per day
- Build and scale real-time decision-making engines, integrating ML models with business logic under strict SLAs
- Collaborate closely with Data Science teams to deploy models reliably into production
- Design and operate systems for model versioning, shadowing, and A/B testing in runtime
- Ensure high availability, scalability, and observability of production services
- Continuously optimize latency, throughput, and cost efficiency
- Work independently while collaborating with stakeholders across Algo, Infra, Product, Engineering, Business Analytics, and Business teams
Requirements
- B.Sc. or M.Sc. in Computer Science, Software Engineering, or a related technical field
- 5+ years of experience building high-performance backend or ML inference systems
- Strong expertise in Python
- Hands-on experience with low-latency APIs and real-time serving frameworks
(FastAPI, Triton Inference Server, TorchServe, BentoML) - Experience designing scalable service architectures
- Strong knowledge of async processing, message queues, and streaming systems
(Kafka, Pub/Sub, SQS, RabbitMQ, Kinesis) - Solid understanding of model deployment, online/offline feature parity, and real-time monitoring
- Experience with cloud platforms (AWS, GCP, or OCI)
- Strong hands-on experience with Kubernetes
- Experience with in-memory / NoSQL databases
(Aerospike, Redis, Bigtable) - Familiarity with observability stacks: Prometheus, Grafana, OpenTelemetry
- Strong sense of ownership and ability to drive solutions end-to-end
- Passion for performance, clean architecture, and impactful systems
Required languages
| English | C2 - Proficient |