DevOps Engineer
We are looking for a DevOps engineer to join our company
Basic requirements for the DevOps Engineer:
- 7+ years of experience in DevOps / Platform Engineering / SRE
- 5+ years of hands-on AWS experience in production environments (EC2, VPC, S3, IAM, Route53)
- 3+ years of Terraform experience, including reusable modules, state management, and CI/CD integration
- 3+ years of Ansible experience - playbooks, roles, inventory management for multi-node deployments
- Strong hands-on experience with GitHub Actions CI/CD - workflows, Docker Buildx, automated builds, and deployments
- Proficiency in Bash and Python scripting - bootstrap automation, service auto-detection, idempotent deployment scripts
- Strong Linux administration skills in production (Ubuntu, Debian, CentOS, Rocky Linux)
- Networking, firewalls (iptables/ufw), VPN, SSH hardening, SSL/TLS certificate management
- Hands-on experience with Docker and Docker Compose at an advanced level - host networking, volume management, multi-service stacks, private registries (Harbor)
- Hands-on experience with Kubernetes and AWS EKS in production - Helm, cluster management, service mesh, ingress controllers
- Deep experience with Prometheus - configuration, file_sd_configs, PromQL, alerting rules, retention tuning
- Grafana - dashboard provisioning, datasources, alerting, service accounts, embedding
- Alertmanager - routing, receivers (Email, Telegram, Slack, MS Teams, SNMP, Webhooks)
- Experience with exporters ecosystem - node-exporter, process-exporter, cAdvisor, mysqld-exporter, JMX exporter
- Understanding of JVM metrics (HikariCP, heap, GC) - we monitor WildFly/Java application servers
- Nginx - reverse proxy, SSL termination, basic auth, WebSocket support, IP-based access control
- Database administration: MySQL/MariaDB (setup, user management, monitoring) plus at least one of PostgreSQL, Oracle DB, ClickHouse
- Experience with IAM, secrets management, and infrastructure security best practices
- Experience supporting production systems, incident response, and root cause analysis
- Confident Git skills (branching, tags, hooks, releases)
- Strong communication skills and ability to work across teams
- English level: B2+
Nice to Have
- AWS Certified DevOps Engineer โ Professional or AWS Certified Solutions Architect โ Associate
- CKA / CKAD
- Experience with Hetzner (Dedicated/Cloud)
- Experience with GitOps tools (Argo CD)
- Experience with AWS Organizations / multi-account environments
- Experience with cost optimization / FinOps
- Experience mentoring engineers and defining engineering standards
- Spring Boot Actuator - understanding Java microservice metrics
- SNMP - traps, MIB files, SNMPv2c/v3
- Hazelcast - cache clusters, native Prometheus endpoint
- Oracle DB - exporter setup, administration
- ELK/OpenSearch for centralized logging
- Cloud-to-cloud migration experience
Responsibilities
- Designing, building, and maintaining software delivery pipelines and infrastructure that support continuous integration, delivery, and deployment
- Building and maintaining Terraform modules and Ansible playbooks for automated product delivery to enterprise clients across AWS, Hetzner, and on-premise environments
- Collaborating with engineering teams to ensure that software is delivered with high quality, speed, and reliability
- Developing and maintaining monitoring and alerting systems (Prometheus, Grafana, Alertmanager, ~20 exporter types) to proactively identify and address issues in production environments
- Troubleshooting production issues, conducting root cause analysis, and implementing remediation plans
- Managing and scaling infrastructure resources - servers, databases, Docker services, EKS/Kubernetes clusters - to ensure optimal performance and cost-effectiveness
- Owning the full monitoring stack: Prometheus, Grafana, Alertmanager, Nginx reverse proxy, SNMP Notifier, and notification channels (Email, Telegram, Slack, MS Teams, Webhooks, SMS)
- Packaging and automating product deployments for client environments - repeatable, infrastructure-as-code driven delivery using Terraform and Ansible
- Implementing security best practices and ensuring compliance with industry standards and regulations
- Maintaining and improving deployment automation scripts (Bash) for multi-node bootstrap, service detection, and rolling updates
- Documenting infrastructure, deployment procedures, and runbooks - we maintain strict documentation standards
- Continuously learning and keeping up-to-date with new technologies and industry trends to improve system performance, security, and efficiency
Technology Landscape
Monitoring Stack: Prometheus, Grafana, Alertmanager, Nginx, SNMP Notifier
Application Servers: WildFly/FTACS, Hazelcast, Spring Boot microservices, IIS Windows
Databases: MySQL, PostgreSQL, Oracle, ClickHouse
Frontend: Angular, FastAPI (config UI)
Infrastructure: Hetzner Dedicated + Cloud, AWS (EKS, EC2, VPC, S3, Route53), Docker Compose, Kubernetes, GitHub Actions
IaC & Delivery: Terraform, Ansible
Notifications: Email (SMTP), Telegram, Slack, MS Teams, SNMP Traps, Webhooks, SMS
~20 exporter types: node, process, cAdvisor, MySQL, Oracle, JMX, ClickHouse, Postgres, Nginx, API metrics, and more
Required languages
| English | B2 - Upper Intermediate |