DevOps
$$$
Product
We are looking for a DevOps Engineer to join our team and take ownership of our infrastructure.
You will work with a modern Kubernetes-based environment, design scalable and reliable systems, and drive the adoption of AI-powered tools and practices across our workflows.
Your Responsibilities
- Design, maintain, and evolve Kubernetes cluster infrastructure and supporting services;
- Architect and optimize infrastructure (Hetzner) with a focus on scalability, fault tolerance, and cost control
- Administer databases (PostgreSQL, MongoDB, Redis, ClickHouse): replication, backups, performance monitoring, migrations
- Develop and maintain CI/CD pipelines (GitLab CI); implement GitOps practices
- Proactive incident management, post-mortem analysis, preventive measures; mentor the team in troubleshooting
- Build and evolve monitoring and alerting systems (Grafana, Prometheus, InfluxDB); define SLIs/SLOs
- Provide technical support to staff; mentor junior/middle DevOps engineers
- Participate in on-call rotations to ensure infrastructure availability
AI Adoption & AI Infrastructure
- Deploy and administer AI Gateway (LiteLLM) for unified access to LLM providers (OpenAI, Anthropic, GCP Vertex AI, AWS Bedrock, etc.)
- Configure routing, fallbacks, rate limiting, cost tracking, and logging via LiteLLM Proxy
- Integrate AI tools into team workflows: AI code review, AI-assisted incident management, routine task automation
- Deploy and maintain infrastructure for LLM/ML models on Kubernetes (GPU scheduling, model serving)
- Monitor AI service performance and costs; optimize token usage and latency
- Research and adopt new AI/ML tools that improve DevOps efficiency
We are open to candidates bringing their own tools and best practices in the AI space โ and integrating them into our infrastructure.
Required Skills โ Administration & Configuration
- Operating System: Ubuntu (deep understanding of systemd, networking, security hardening)
- Databases: PostgreSQL, MongoDB, Redis, ClickHouse โ replication, backups, monitoring, performance tuning
- Containerization: Docker (optimized images, multi-stage builds, security best practices)
- Orchestration: Kubernetes (Helm, RBAC, network policies, HPA/VPA, cluster upgrades, troubleshooting)
- CI/CD: GitLab CI/CD (advanced pipelines, environments, security scanning, GitOps)
- Monitoring: Grafana, Prometheus, InfluxDB; dashboards, alerts, SLI/SLO definition
- AI Infrastructure: LiteLLM, experience with LLM API Gateways, understanding of LLM providers and their APIs
Tooling Skills
- Linux: firewall (iptables/nftables), DNS, network diagnostics, filesystems, performance tuning
- Cloud Providers: Hetzner, AWS (EC2, S3, RDS, IAM, VPC, EKS)
- Cloudflare: DNS, WAF, CDN, Workers, Zero Trust
- Atlassian: JIRA, Confluence โ documentation and process management
- Languages: Python, Bash (automation, scripting); basic understanding of PHP
Nice to Have
- Kafka, RabbitMQ โ production experience
- Deep understanding of computer architecture, processors, filesystems
- Networking protocols (TCP/IP, HTTP/2, gRPC, TLS)
- Experience with GPU infrastructure for ML/AI workloads
- MLOps experience: model serving (vLLM, TGI, Triton), experiment tracking
- Experience with MCP (Model Context Protocol), AI agents, RAG architecture
- Service mesh (Istio, Linkerd)
- HashiCorp Vault, cert-manager
Additional Requirements
- English โ Upper Intermediate+ (confident reading of logs, technical documentation, writing documentation, communication with English-speaking colleagues)
- Experience mentoring and conducting code/infrastructure reviews
- Ability to write post-mortems and technical documentation
- Proactive mindset: ability to identify problems before they occur
Required languages
| English | B2 - Upper Intermediate |
| Ukrainian | Native |
Published 13 April
11 views
ยท
2 applications
๐
Average salary range of similar jobs in
analytics โ
Loading...