DevOps / MLOps Engineer Offline
Requirements:
- Proven experience in deploying and monitoring machine learning models in production environments.
- 5+ years of experience working with Docker, Kubernetes, Helm, and CI/CD pipelines and best practices.
- 5+ years of experience with observability tools such as Prometheus, Thanos, and Grafana.
- Familiarity with model monitoring tools such as Arize, Evidently AI, and Alibi Detect.
- Experience with A/B testing and service mesh software such as Istio.
- Proficiency in using platforms like Kubeflow and OpenDataHub for model deployment and management.
- Strong understanding of infrastructure monitoring and observability best practices.
- Excellent problem-solving skills and the ability to troubleshoot complex issues.
- Experience with cloud platforms such as AWS, Google Cloud Platform (GCP), or Azure.
- Knowledge of scripting and automation tools (e.g., Bash, Python)
Responsibilities:
- Deploy and manage machine learning models on Kubernetes clusters.
- Develop robust data pipelines for training, inference, and analytics purposes.
- Monitor and manage cluster infrastructure using Prometheus, Thanos, and other observability tools.
- Implement and maintain model observability frameworks using tools like Arize.
- Implement A/B testing strategies and software, such as Istio, to evaluate model performance and reliability.
- Develop and maintain dashboards and alerting systems to track model performance and data quality metrics.
- Collaborate with data scientists and ML engineers to ensure models are robustly monitored, and issues are quickly identified and resolved.
- Ensure the scalability and reliability of model deployment pipelines using Kubernetes.
- Stay up to date with the latest advancements in model observability and infrastructure monitoring technologies.
The job ad is no longer active
Look at the current jobs DevOps →
📊
$4000-5600
Average salary range of similar jobs in
analytics →
Loading...