Senior DevOps / SRE Engineer (MLOps platform)

Ukrainian Product πŸ‡ΊπŸ‡¦

Our project focuses on building and operating a scalable MLOps platform based on Kubeflow and KServe, enabling end-to-end machine learning workflows β€” from experimentation to production inference.  

The platform is deployed on Kubernetes and integrates modern DevOps/SRE practices, including automation, observability, and reliability engineering. We are actively evolving the system towards cloud-native architectures with a strong focus on performance, scalability, and cost efficiency. 

This is a great opportunity to work on production-grade ML infrastructure, influence platform design, and build reliable solutions for real business needs. 

 

Responsibilities 

  • Building software services for Data Scientists, Data Engineering an Developer teams.  
  • Patching up support escalation cases.  
  • Making on-call rotations and processes the best they can be.  
  • Documenting share-ready knowledge.  
  • Conducting post-incident reviews that help.  
  • Delivering solutions that leverage the best automation tools on offer.  
  • Maintainability mission-critical ML services 
  • Participation in bootstrap new ML teams/projects 
     

Requirements 

  • Experience with GitLab; 
  • Experience with Continuous Integration; 
  • Experience in AWS; 
  • Experience with Azure AD (authentication understanding); 
  • Experience with terraform; 
  • Experience with ML workflows (will be a plus) 
  • Experience with maintaining and/or developing AI agents (will be a plus) 
  • Eager to take responsibility, accountability and ownership of systems and processes; 
  • Experience using Atlassian products: Jira, Confluence 
  • Experience using Service Management(OpsGenie) or PagerDuty 
  • 3+ years of industry experience; 
  • Expert in using REST and SOAP API’s; 
  • Knowledgeable of securing data; understands PGP, SSH, OAuth, SFTP, HTTPS, SFTP; 
  • Basic FinOps experience(will be a plus) 
  • Knowledgeable of OWASP Top 10 

 

Technologies that we use 

  • AWS Cloud  
  • ECS, EKS, Kubernetes, Prometheus, Grafana, Kubeflow, ArgoCD, Kyverno 
  • GitLab 
  • Terraform 
  • Kustomize/Helm 
     

We offer 

  • The opportunity to work on a big and scalable MLOps Platform. 
  • Very skilled teammates and positive atmosphere  
  • We are not tied to the office, willing to work remotely. 
  • Health insurance. 
  • Compensation of sports clubs and foreign language schools. 
  • Internal training (IT and not only); 

Required skills experience

AWS 3 years
Kubernetes 3 years
Terraform 3 years
GitLab 3 years

Required languages

English B1 - Intermediate
Ukrainian Native
Published 20 March
28 views
Β·
4 applications
50% read
Β·
25% responded
Last responded 1 hour ago
To apply for this and other jobs on Djinni login or signup.
Loading...