Senior DevOps / SRE Engineer (MLOps platform)
Ukrainian Product
πΊπ¦
Our project focuses on building and operating a scalable MLOps platform based on Kubeflow and KServe, enabling end-to-end machine learning workflows β from experimentation to production inference.
The platform is deployed on Kubernetes and integrates modern DevOps/SRE practices, including automation, observability, and reliability engineering. We are actively evolving the system towards cloud-native architectures with a strong focus on performance, scalability, and cost efficiency.
This is a great opportunity to work on production-grade ML infrastructure, influence platform design, and build reliable solutions for real business needs.
Responsibilities
- Building software services for Data Scientists, Data Engineering an Developer teams.
- Patching up support escalation cases.
- Making on-call rotations and processes the best they can be.
- Documenting share-ready knowledge.
- Conducting post-incident reviews that help.
- Delivering solutions that leverage the best automation tools on offer.
- Maintainability mission-critical ML services
- Participation in bootstrap new ML teams/projects
Requirements
- Experience with GitLab;
- Experience with Continuous Integration;
- Experience in AWS;
- Experience with Azure AD (authentication understanding);
- Experience with terraform;
- Experience with ML workflows (will be a plus)
- Experience with maintaining and/or developing AI agents (will be a plus)
- Eager to take responsibility, accountability and ownership of systems and processes;
- Experience using Atlassian products: Jira, Confluence
- Experience using Service Management(OpsGenie) or PagerDuty
- 3+ years of industry experience;
- Expert in using REST and SOAP APIβs;
- Knowledgeable of securing data; understands PGP, SSH, OAuth, SFTP, HTTPS, SFTP;
- Basic FinOps experience(will be a plus)
- Knowledgeable of OWASP Top 10
Technologies that we use
- AWS Cloud
- ECS, EKS, Kubernetes, Prometheus, Grafana, Kubeflow, ArgoCD, Kyverno
- GitLab
- Terraform
- Kustomize/Helm
We offer
- The opportunity to work on a big and scalable MLOps Platform.
- Very skilled teammates and positive atmosphere
- We are not tied to the office, willing to work remotely.
- Health insurance.
- Compensation of sports clubs and foreign language schools.
- Internal training (IT and not only);
Required skills experience
| AWS | 3 years |
| Kubernetes | 3 years |
| Terraform | 3 years |
| GitLab | 3 years |
Required languages
| English | B1 - Intermediate |
| Ukrainian | Native |
π
Average salary range of similar jobs in
analytics β
Loading...