MLOps Engineer
About us:
Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with the organization of the first Data Science UA conference, setting the foundation for our growth. Over the past 8 years, we have diligently fostered the largest Data Science Community in Eastern Europe, boasting a network of over 30,000 AI top engineers.
About the client:
Our client is a leading innovator in artificial intelligence solutions, specializing in AI-driven chatbot technologies that revolutionize human-machine interactions.
About the role:
We are looking for a highly skilled and experienced MLOps Engineer to join the team. This role is ideal for someone with strong expertise in designing, deploying, and managing advanced ML systems with a focus on automation and reliability.
Requirements:
- Minimum 5 years in an MLOps Engineer role or a similar position.
- Strong proficiency in Python and related ML libraries (PyTorch, Hugging face, Transformers).
- Extensive experience in implementing engineering best practices and a deep understanding of Machine Learning fundamentals.
- Hands-on experience with technologies like Apache Spark (Spark SQL, MLlib/Spark ML) or similar big data frameworks and proficiency in tools such as Hadoop, Kafka, Cassandra, GCP BigQuery, AWS Redshift, Apache Beam, Apache Flink, etc.
- Experience with automated data pipeline and workflow tools, such as Airflow, Argo Workflows, Kubeflow, etc.
- Practical experience with major cloud providers, including AWS, GCP, or Azure.
- Proficiency in one or more MLOps platforms/technologies such as AWS SageMaker, Azure ML, GCP Vertex AI, Databricks, MLFlow, Kubeflow, or TensorFlow Extended (TFX).
Would be a plus:
- Experience with Large Language Models (LLMs) and computer vision applications, including image generation tools, Speech-to-Text (STT), Speech-to-Speech (STS), and Text-to-Speech (TTP).
- AWS, GCP, or Azure certifications are a strong advantage.
- Ability to quickly adapt to new technologies and environments, with a startup mindset for handling ambiguity and fast-paced change.
Responsibilities:
- Implement and Maintain CI/CD Pipelines.
- Set up and ensure smooth operation of continuous integration and continuous delivery for AI and machine learning projects.
- Establish reliable strategies for deploying AI models, with a focus on LLMs (Large Language Models) and Retrieval-Augmented Generation (RAG).
- Track deployed AI models' reliability, availability, and performance to ensure optimal operation.
- Work closely with AI teams to transition machine learning models and algorithms into production environments efficiently.
- Promote the use of version control, configuration management, and testing protocols for AI-driven solutions.
- Utilize MLOps Tools.
- Leverage frameworks such as Kubeflow, MLflow, or TensorFlow Extended (TFX) to manage the machine learning lifecycle, from experimentation to production.
- Set up monitoring systems for infrastructure metrics and AI model performance to enable early issue detection.
- Engage in on-call rotations using Site Reliability Engineering (SRE) principles to ensure uptime and meet service-level objectives (SLOs).
The company offers:
- A collaborative, innovative environment where your contributions make a difference.
- The chance to work with a passionate team of data scientists, engineers, product managers, and designers.
- A culture that values learning, growth, and the pursuit of excellence.
- This role requires relocation to Dubai after successfully completing the probation period. The client highly values on-site collaboration and expects team members to work from the company’s office to ensure alignment, productivity, and team synergy.
To support a smooth transition, the company provides full relocation assistance and offers a relocation bonus to help candidates comfortably settle in and adapt to the new environment.