Site Reliability Engineer Offline

Your daily adventures as a Site Reliability Engineer will be:

 

- Build tools to measure the availability and performance of our systems

- Implement and maintain observability and monitoring tools across the entire stack; logging systems and aggregators, metrics monitoring tools, distributed tracing platforms, etc.

- Manage monitoring systems to support customer metrics delivery dashboards

- Delivering easy to digest dashboards containing differing levels of metrics and data for internal teams and customers

- Providing subject matter expertise on monitoring summarization and reporting, especially during critical incidents

- Educate software and infrastructure engineering teams on observability and monitoring best practices

- Monitoring and adjusting key infrastructure KPIs, SLOs, and SLAs to reflect changes in the environment

- Implementing monitoring as code so changes can be easily audited and maintained

- Identifying technology gaps and implementing appropriate solutions collaboratively

 

You are best for us at Admirals if you have:

 

~ Experience in a similar position (3+ years)

~ Proficiency in metrics monitoring systems, such as Grafana, Prometheus, Zabbix

~ Experience in logging systems and concepts, such as ELK

~ Proficiency knowledge in distributed tracing platforms such as Jaeger, Zipkin

~ Experience working in a DevOps environment

~ Good knowledge of GIT

~ Knowledge of English at the level Intermediate+

The job ad is no longer active

Look at the current jobs (Other) Remote→