DevOps Engineer
Data Science UA is a service company with deep expertise in AI and Data Science. Our story started in 2016 with the first Data Science UA Conference in Kyiv, and since then, we’ve built one of the largest AI communities in Europe.
About the role and client:
We're looking for an IT DevOps Engineer to join our client's team. They are building a production-grade AI platform to power intelligent automation across the enterprise. As IT DevOps Engineer on the AI Orchestration and Dev team, you are the engineer who makes sure that everything the architects design and the software engineers build actually runs, reliably, securely, and at scale. You will own the infrastructure-as-code, CI/CD pipelines, container orchestration, secrets management, and cloud cost governance for the client's IT AI environment.
Responsibilities:
Infrastructure as Code & Cloud Provisioning
- Own the Terraform / Bicep codebase for all clients IT AI Azure resources: AKS, Azure OpenAI, APIM, Storage, Key Vault, Service Bus, Azure AI Search, Azure ML, and Entra ID app registrations
- Enforce IaC-only provisioning (no manual resource creation), with drift detection and automated remediation
- Manage multiple environments (dev, staging, prod) with environment-specific configurations and promotion gates
- Design and operate Azure Landing Zone patterns: management groups, policies, tagging standards, and network topology
CI/CD Pipeline Engineering
- Build and maintain GitHub Actions workflows for all AI platform components: Python services, TypeScript tooling, infrastructure modules, and container images
- Implement pipeline stages: lint, unit test, integration test, container build, vulnerability scan, policy check, deploy, smoke test, and auto-rollback
- Manage GitHub repository settings, branch protection rules, required reviewers, and environment secrets
- Integrate GitHub Copilot and Azure AI Foundry tooling into the developer workflow; maintain the internal developer portal
Kubernetes & Container Operations
- Manage AKS clusters: node pool lifecycle, cluster upgrades, and certificate rotation
- Configure Kubernetes networking (CNI, network policies, ingress controllers, private endpoints), storage (PVCs, Azure Files, Azure Disk), and RBAC
- Operate the container registry (ACR): image scanning, retention policies, geo- replication, and pull-through cache
- Implement GitOps deployment patterns for continuous delivery to AKS
MLOps / LLMOps Platform
- Build and operate model deployment pipelines: package LLM endpoints, manage Azure OpenAI deployments
- Manage Azure ML workspaces: compute clusters, environments, datasets, and model registry
- Operate MLflow or Azure ML experiment tracking
Identity, Secrets & Security
- Manage Azure Key Vault: secret rotation, access policies, diagnostic logging, and integration with Kubernetes via the CSI driver
- Configure Managed Identities and Workload Identity for all platform services, with no password-based service principals
- Implement Entra ID Conditional Access and PIM for privileged access to production resources
- Run regular access reviews and enforce least-privilege across all Azure RBAC assignments
Observability & Incident Response
- Design the observability stack: Azure Monitor, Log Analytics, Application Insights, and OpenTelemetry collectors
- Build dashboards covering infrastructure health, AI inference latency and error rates, token spend, agent success rates, and pipeline health
- Define and own SLOs for platform services; create actionable alert runbooks and on-call playbooks
- Lead incident response for platform outages: triage, mitigation, post-mortem, and remediation tracking
FinOps & Cost Engineering
- Build Azure Cost Management dashboards with per-team, per-workload, and per-model cost breakdowns
- Enforce resource tagging via Azure Policy and run monthly cost reviews with engineering teams
- Model and forecast infrastructure costs for new platform capabilities before they are built
Requirements:
- 4+ years of DevOps or platform engineering experience in cloud-native environments
- Hands-on Azure expertise: AKS, Azure Monitor, Azure Networking, Key Vault,
- Entra ID, and at least one AI/ML service (Azure OpenAI, Azure ML, or Cognitive Services)
- Strong IaC skills: Terraform or Bicep, with real production codebases
- Deep GitHub Actions experience: multi-stage pipelines, reusable workflows, environment approvals, and secrets management
- Solid Kubernetes knowledge: deployments, services, ingress, RBAC, node pools, and cluster operations
- Experience with container lifecycle: Dockerfile authoring, image scanning, ACR, and deployment patterns
- Understanding of cloud security fundamentals: IAM, network segmentation, secrets rotation, and audit logging
- Strong incident response instincts: debugging distributed systems, reading logs and metrics under pressure
- Fluency in English
Nice-to-have
- Experience with MLOps or LLMOps: model deployment pipelines, Azure ML, MLflow, or equivalent
- GitOps experience in AKS environments
- FinOps practitioner knowledge: cost allocation, chargeback models, reserved capacity management
- Experience with Azure Policy, Management Groups, and Landing Zone patterns at enterprise scale
- Scripting ability in Python or Bash for automation and tooling
- Familiarity with AI developer tools: GitHub Copilot, Azure AI Foundry, or Anthropic Claude API infrastructure
- Microsoft certifications: AZ-104 (Administrator), AZ-400 (DevOps Engineer Expert), or AZ-305 (Architect)