Machine Learning Engineer (Real-Time Inference Systems)

Our client is a leading mobile marketing and audience platform empowering the global app ecosystem with advanced solutions in mobile marketing, audience building, and monetization.

With direct integrations into 500,000+ mobile apps worldwide, they process massive volumes of first-party data to deliver intelligent, real-time, and scalable advertising decisions. Their platform operates at extreme scale, serving billions of requests per day under strict latency and performance constraints.

About the Role

We are looking for a highly skilled, independent, and driven Machine Learning Engineer to own and lead the design and development of our next-generation real-time inference services.

This is a rare opportunity to take ownership of mission-critical systems on a massive scale, working at the intersection of machine learning, large-scale backend engineering, and business logic.

You will build robust, low-latency services that seamlessly combine predictive models with dynamic decision logic — while meeting extreme requirements for performance, reliability, and scalability.

Responsibilities

Own and lead the design and development of low-latency inference services handling billions of requests per day
Build and scale real-time decision-making engines, integrating ML models with business logic under strict SLAs
Collaborate closely with Data Science teams to deploy models reliably into production
Design and operate systems for model versioning, shadowing, and A/B testing in runtime
Ensure high availability, scalability, and observability of production services
Continuously optimize latency, throughput, and cost efficiency
Work independently while collaborating with stakeholders across Algo, Infra, Product, Engineering, Business Analytics, and Business teams

Requirements

B.Sc. or M.Sc. in Computer Science, Software Engineering, or a related technical field
5+ years of experience building high-performance backend or ML inference systems
Strong expertise in Python
Hands-on experience with low-latency APIs and real-time serving frameworks
(FastAPI, Triton Inference Server, TorchServe, BentoML)
Experience designing scalable service architectures
Strong knowledge of async processing, message queues, and streaming systems
(Kafka, Pub/Sub, SQS, RabbitMQ, Kinesis)
Solid understanding of model deployment, online/offline feature parity, and real-time monitoring
Experience with cloud platforms (AWS, GCP, or OCI)
Strong hands-on experience with Kubernetes
Experience with in-memory / NoSQL databases
(Aerospike, Redis, Bigtable)
Familiarity with observability stacks: Prometheus, Grafana, OpenTelemetry
Strong sense of ownership and ability to drive solutions end-to-end
Passion for performance, clean architecture, and impactful systems

Required languages

English

C2 - Proficient

Published 29 January

39 views

3 applications

100% read

To apply for this and other jobs on Djinni login or signup.

Only from 6 years of experience
Full Remote
Countries of Europe or Ukraine
Countries where we consider candidates
- English C2 - Proficient

Data Science

Employment: Fulltime
Domain: Advertising / Marketing
Product

Apply for the job

100% read

0% responded

📊 $4000-7000 Average salary range of similar jobs in analytics →