Site Reliability Engineer (offline)

Your daily adventures as a Site Reliability Engineer will be:

- Build tools to measure the availability and performance of our systems
- Implement and maintain observability and monitoring tools across the entire stack; logging systems and aggregators, metrics monitoring tools, distributed tracing platforms, etc.
- Manage monitoring systems to support customer metrics delivery dashboards
- Delivering easy to digest dashboards containing differing levels of metrics and data for internal teams and customers
- Providing subject matter expertise on monitoring summarization and reporting, especially during critical incidents
- Educate software and infrastructure engineering teams on observability and monitoring best practices
- Monitoring and adjusting key infrastructure KPIs, SLOs, and SLAs to reflect changes in the environment
- Implementing monitoring as code so changes can be easily audited and maintained
- Identifying technology gaps and implementing appropriate solutions collaboratively

You are best for us at Admirals if you have:

~ Experience in a similar position (3+ years)
~ Proficiency in metrics monitoring systems, such as Grafana, Prometheus, Zabbix
~ Experience in logging systems and concepts, such as ELK
~ Proficiency knowledge in distributed tracing platforms such as Jaeger, Zipkin
~ Experience working in a DevOps environment
~ Good knowledge of GIT
~ Knowledge of English at the level Intermediate+

The job ad is no longer active

Look at the current jobs (Other) Remote→