Senior Machine Learning Ops Engineer

Description

Who is our client:
Our client is a global data products and technology company. They are on a mission to transform marketing by building the fastest, most connected data platform that bridges marketing strategy to scaled activation.
They work with agencies and clients to transform the value of data by bringing together technology, data and analytics capabilities. Delivering this through the AI-enabled media and data platform for the next era of advertising.
The client is endlessly curious. Their team of thinkers, builders, creators and problem solvers are over 1,000 strong, across 20 markets around the world. Our client’s culture is based on mutual trust, sharing, building, and learning together. They value simplicity, maintainability, automation, and metrics.

 

About this role:
Client’s team consists of 100+ engineers, designers, data scientists, implementation, and product people, working in small inter-disciplinary teams closely with creative agencies, media agencies, and with our customers, to develop and scale our leading digital advertising optimization suite that delivers amazing outcomes for brands and audiences.
Client’s platforms are built with Python, React, and Clojure, are deployed using CI/CD, heavily exploit automation, and run on AWS, GCP, k8s, Snowflake, BigQuery, and more. They serve 9 petabytes and 77 billion objects annually, optimize thousands of campaigns to maximise ROI, and deliver 20 billion ad impressions across the globe. You’ll play a leading role in significantly scaling this further.
As client’s first Machine Learning Operations (MLOps) Engineer on the team, you will play a pivotal role in bridging the gap between platform engineering, data science, and software engineering, building systems that drive the deployment, monitoring, and scalability of machine learning models. You will design and implement pipelines, automate workflows, and optimise model performance in training and production environments. You’ll lead the creation of process, implementation of tools, and creation of solutions mature how we integrate machine learning solutions into our production systems, while maintaining reliability, security, and efficiency. You’ll additionally play a leading role in driving continuous improvement in model lifecycle management, from development to deployment and monitoring.

 

Requirements

Technical Skills:
• Proficiency in Python for ML development; familiarity with additional languages like Clojure is a plus.
• Expertise in cloud platforms (AWS, GCP) and data warehouses like Snowflake or BigQuery.
• Strong knowledge of MLOps frameworks (e.g., Kubeflow, MLflow) and DevOps tools (e.g., Jenkins, GitLab, Flux)
• Experience with containerization (Docker) and orchestration (Kubernetes)
• Experience with infrastructure-as-code tools like Terraform
Machine Learning Knowledge:
• Solid understanding of machine learning principles, including model evaluation, explainability, and retraining workflows.
• Hands-on experience with ML frameworks such as TensorFlow or PyTorch
Big Data Handling:
• Proficiency in SQL/NoSQL databases and distributed computing systems like Dataprov, EMR, Spark, Hadoop
Soft Skills:
• Strong communication skills to collaborate across multidisciplinary teams.
• Problem-solving mindset with the ability to work in agile environments
Experience:
• At least 4+ years in platform, software, or MLOps engineering roles
• Proven track record of deploying scalable ML solutions in production environments

 

Job responsibilities

Model Deployment and Operations:
• Deploy, monitor, and maintain machine learning models in production environments.
• Automate model training, retraining, versioning, and governance processes.
• Monitor model performance, detect drift, and ensure scalability and reliability of ML workflows
Infrastructure and Pipeline Management:
• Design and implement scalable MLOps pipelines for data ingestion, transformation, and model deployment.
• Build infrastructure-as-code solutions using tools like Terraform to manage cloud environments (AWS, GCP)
Collaboration with Teams:
• Work closely with data scientists to operationalize machine learning models.
• Collaborate with software engineers to integrate ML systems into broader platforms
Cloud and Big Data Expertise:
• Utilize cloud services from AWS, GCP, and Snowflake for scalable data storage and processing.
DevOps Integration:
• Implement CI/CD pipelines and automations to streamline ML model deployment.
• Use containerization tools like Docker and orchestration platforms like Kubernetes for scalable deployments
• Use Observability platforms to monitor pipeline and operational health of model production, delivery and execution

Published 2 July
30 views
·
1 application
100% read
·
0% responded
To apply for this and other jobs on Djinni login or signup.