DevOps

$$$
Product

We are looking for a DevOps Engineer to join our team and take ownership of our infrastructure.
You will work with a modern Kubernetes-based environment, design scalable and reliable systems, and drive the adoption of AI-powered tools and practices across our workflows.

 

Your Responsibilities

  • Design, maintain, and evolve Kubernetes cluster infrastructure and supporting services;
  • Architect and optimize infrastructure (Hetzner) with a focus on scalability, fault tolerance, and cost control
  • Administer databases (PostgreSQL, MongoDB, Redis, ClickHouse): replication, backups, performance monitoring, migrations
  • Develop and maintain CI/CD pipelines (GitLab CI); implement GitOps practices
  • Proactive incident management, post-mortem analysis, preventive measures; mentor the team in troubleshooting
  • Build and evolve monitoring and alerting systems (Grafana, Prometheus, InfluxDB); define SLIs/SLOs
  • Provide technical support to staff; mentor junior/middle DevOps engineers
  • Participate in on-call rotations to ensure infrastructure availability

 

AI Adoption & AI Infrastructure

  • Deploy and administer AI Gateway (LiteLLM) for unified access to LLM providers (OpenAI, Anthropic, GCP Vertex AI, AWS Bedrock, etc.)
  • Configure routing, fallbacks, rate limiting, cost tracking, and logging via LiteLLM Proxy
  • Integrate AI tools into team workflows: AI code review, AI-assisted incident management, routine task automation
  • Deploy and maintain infrastructure for LLM/ML models on Kubernetes (GPU scheduling, model serving)
  • Monitor AI service performance and costs; optimize token usage and latency
  • Research and adopt new AI/ML tools that improve DevOps efficiency

We are open to candidates bringing their own tools and best practices in the AI space โ€” and integrating them into our infrastructure.

 

Required Skills โ€” Administration & Configuration

  • Operating System: Ubuntu (deep understanding of systemd, networking, security hardening)
  • Databases: PostgreSQL, MongoDB, Redis, ClickHouse โ€” replication, backups, monitoring, performance tuning
  • Containerization: Docker (optimized images, multi-stage builds, security best practices)
  • Orchestration: Kubernetes (Helm, RBAC, network policies, HPA/VPA, cluster upgrades, troubleshooting)
  • CI/CD: GitLab CI/CD (advanced pipelines, environments, security scanning, GitOps)
  • Monitoring: Grafana, Prometheus, InfluxDB; dashboards, alerts, SLI/SLO definition
  • AI Infrastructure: LiteLLM, experience with LLM API Gateways, understanding of LLM providers and their APIs

 

Tooling Skills

  • Linux: firewall (iptables/nftables), DNS, network diagnostics, filesystems, performance tuning
  • Cloud Providers: Hetzner, AWS (EC2, S3, RDS, IAM, VPC, EKS)
  • Cloudflare: DNS, WAF, CDN, Workers, Zero Trust
  • Atlassian: JIRA, Confluence โ€” documentation and process management
  • Languages: Python, Bash (automation, scripting); basic understanding of PHP

 

Nice to Have

  • Kafka, RabbitMQ โ€” production experience
  • Deep understanding of computer architecture, processors, filesystems
  • Networking protocols (TCP/IP, HTTP/2, gRPC, TLS)
  • Experience with GPU infrastructure for ML/AI workloads
  • MLOps experience: model serving (vLLM, TGI, Triton), experiment tracking
  • Experience with MCP (Model Context Protocol), AI agents, RAG architecture
  • Service mesh (Istio, Linkerd)
  • HashiCorp Vault, cert-manager

 

Additional Requirements

  • English โ€” Upper Intermediate+ (confident reading of logs, technical documentation, writing documentation, communication with English-speaking colleagues)
  • Experience mentoring and conducting code/infrastructure reviews
  • Ability to write post-mortems and technical documentation
  • Proactive mindset: ability to identify problems before they occur

 

Required languages

English B2 - Upper Intermediate
Ukrainian Native
Published 13 April
11 views
ยท
2 applications
To apply for this and other jobs on Djinni login or signup.
Loading...