Site Reliability Engineer (SRE)

Our client is a remote-first, dynamic international product company in the iGaming field. Currently we’re on the lookout for an experienced Site Reliability Engineer (SRE) for their team.

 

RESPONSIBILITIES:
- Development and implementation of monitoring, alerting and metrics processes.
- Participation in eliminating failures and investigating causes;
- Improving Application Observability level;
- Design, implementation and support of metrics for different levels of monitoring;
- Participation in the design and implementation of fault-tolerant application architecture;
- Organization of rapid response processes to incidents (Incident Response);
- Participating in Disaster Recovery strategies development;
- Introduction and popularization of post-incident meetings (Postmortem);
- Conducting root cause analysis (RCA) to prevent recurrence of incidents;
- Building and automation of automatic problem response systems;
- Participation in logging management on the product;
- Creation of reliability documentation and guides;
- Ownership of processes and approaches to support the uninterrupted operation of services.

 

REQUIREMENTS:
- At least 2+ years of experience in the SRE position;
- Overall experience of 5+ years in the Infrastructure management;
- Linux/Unix environment;
- Strong expertise with containerised workloads and services such as Docker, Docker Swarm and Kubernetes;
- Experience with Cloud solutions (AWS, GCP, or any other);
- Automatic deployment tools such as Ansible, Terraform;
- Proven expertise with databases as: MySQL, PostgresSQL, Redis, Mongo (Clickhouse is a plus);
- Monitoring tools (Grafana/Prometeus/CloudWatch etc.);
- Logging tools (ELK stack etc.);
- Networking: network topologies and common network protocols and services (TCP/IP, DNS, HTTP(S), SSH, SMTP, IPMI, L2/L3 layers);
- Message brokers such as Kafka, RabbitMQ etc;
- Robust skills with CI/CD tools such as Gitlab CI;
- Shell (Bash), Any relevant programming languages, e.g. Golang, Python is a big plus;
- Storage/filesystems: nfs, RBD, Ceph, ext*, xfs, raid*;
- Version Control: Experience administrating version control systems such as GIT.

 

WE OFFER:

- Possibility of a remote work from anywhere in the world

- Generous days-off policy (vacation, sick leave, days off, holidays)

- Guaranteed performance reviews & career plan development

- Low bureaucracy level, with decisions made quickly

- Open-minded and easy-going management

- Friendly atmosphere among people who love their work.

146 views
·
16 applications
94% read
·
0% responded
59 views
·
4 applications
75% read
·
0% responded
To apply for this and other jobs on Djinni login or signup.