SRE Engineer
We are looking for an SRE Engineer to join our teams!
Requirements:
- Experience with Kubernetes and Helm;
- Familiarity with infrastructure management using Terraform and Ansible;
- Awareness of AWS services such as VPC, IAM, S3, EC2, RDS, SM, SSM, EKS, ECR;
- Ability to read and understand Golang code;
- Experience with operating systems using systemd and bash (posix);
- Exposure to handling CI/CD pipelines using tools like Gitlab CI and ArgoCD or similar tools;
- Basic knowledge of monitoring stacks, with a preference for Grafana.
Will be plus:
- Understanding of large-scale systems management and design;
- Experience in building software applications.
Soft Skills:
- Able to clarify task requirements, estimate task implementation time, and execute it within the given timeline;
- Fluent communication skills, able to contribute to documentation.
Responsibilities:
Incident management
– Support and resolve incidents in the production environment, including participation in postmortems and incident cause analysis
– Participate in the development of DRP practices: document scenarios, conduct training
– Participate in the setup and diagnosis of replication and backups according to existing policies
– Maintain the relevance and reliability of the backup and recovery scheme
Release management
– Define and maintain release pipelines for companys' applications
– Ensure smooth deployments with minimal downtime
– Implement different deployment strategies (e.g., blue-green, canary, rolling updates) based on the specific needs of each service
– Establish tagging conventions for releases and track changes across different environments (development, staging, production)
– Manage artifacts (e.g., Docker images, Helm charts) in a central repository
– Automate the process of publishing and promoting artifacts through different stages
– Maintain production applications configuration
– Define clear promotion paths (e.g., dev → staging → production)
– Automate environment promotion to ensure consistency and reduce manual errors
– Establish rollback procedures in case of issues during deployment
Infrastructure management
– Support of available infrastructure
– Optimization of infrastructure and systems productivity
– Development and scaling of the system
– Setting up new "instances" of infrastructure: environments, clusters, instances, services, etc.
– Contribute to the company's Terraform, Helm, and ArgoCD modules
– Develop and implement metrics and dashboards to improve service observability under existing policies
– Integrate and enhance tracking of services and products uptime metrics
– Analyze existing processes, develop, and implement automated solutions
Uptime tracking and Reliability practices
– Contribute to uptime tracking strategies development
– Set up monitoring and alerting for releases, infrastructure, and critical systems
– Review and provide feedback on internal infrastructure, applications, and modules
– Implementation of new tools, approaches, and practices
– Daily team communication
Our Stack:
Ansible, ArgoCD, AWS (VPC, IAM, S3, EC2), AWS SM/SSM, bash, posix, GitLab, GitLab CI, Golang, LGTM, K8S, AWS EKS, Linux, systemd, Loki, OpenTelemetry, Terraform
Our benefits to you:
☘️An exciting and challenging job in a fast-growing holding, the opportunity to be part of a multicultural team of top professionals in Development, Architecture, Management, Operations, Marketing, Legal, Finance and more
🤝🏻Great working atmosphere with passionate experts and leaders, sharing a friendly culture and a success-driven mindset is guaranteed
🧑🏻💻Modern corporate equipment based on macOS or Windows and additional equipment are provided
🏖Paid vacations, sick leave, personal events days, days off
💵Referral program — enjoy cooperation with your colleagues and get the bonus
📚Educational programs: regular internal training sessions, compensation for external education, attendance of specialized global conferences
🎯Rewards program for mentoring and coaching colleagues
🗣Free internal English courses
✈️In-house Travel Service
🦄Multiple internal activities: online platform for employees with quests, gamification, presents and news, PIN-UP clubs for movie / book / pets lovers and more
🎳Other benefits could be added based on your location
Required skills experience
| ArgoCD | |
| IAM | |
| S3 | |
| bash | |
| GitLab |
| Gitlab CI/CD |
Required languages
| English | B1 - Intermediate |