OnDuty Engineer (Sysadmin)
$$
Product
We need an on-call engineer to monitor and support our service. This is a front-line position: you work strictly according to runbooks, documenting everything that happens, and escalating non-standard situations to a Senior Operations Engineer. The key requirements for this role are attentiveness, discipline, and the ability to clearly describe the problem.
Tasks
- Continuous monitoring of services and infrastructure
- Responding to alerts strictly according to runbooks
- Initial incident diagnostics: checking availability, logs, and service status
- Escalation to a Senior Operations Engineer if runbook scope is exceeded
- Maintaining event and incident logs; providing timely status updates
Requirements
- Linux, command line โ SSH, log navigation (journalctl, tail, grep), service management (systemctl), basic load and disk space diagnostics (top/htop, df, du)
- Network, basic โ host and port availability checks (ping, curl, nc/telnet), understanding DNS, assessing whether a service is alive or not
- Infrastructure, basic โ understanding the difference between a physical host and a VM; understanding out-of-band access (IPMI/BMC); basic familiarity with the cloud console (instance status, metrics)
- Monitoring and Dashboards โ reading metrics and graphs (Grafana or similar), understanding alerts, severity, and thresholds; Ability to distinguish a real incident from a false positive
- NGINX โ reading configs, working with logs, restarting
- MySQL โ basic read-only queries, checking replication, reading slow logs
- Docker / Docker Compose โ container status, reading logs, restarting, basic reading of compose files
- Working with LLM assistants (Claude, Cursor, etc.) โ using them for diagnostics, finding solutions, and documentation
- English for reading technical documentation and alerts
- Ability to clearly and concisely describe a problem in writing
- at least 1 year of experience in a sysadmin, support, or operations role
Nice to have
- Physical server administration: IPMI / iDRAC / iLO (remote reset, console access, hardware testing)
- Hypervisors: KVM / Proxmox / VMware or similar โ VM lifecycle management
- Clouds โ GCP, AWS, Azure, Yandex Cloud: instances, disks, networks, metrics, and logs in the console
- On-call systems: PagerDuty, OpsGenie, or similar
- Understanding Prometheus-style monitoring (probe, metric, alert rules)
Required languages
| Russian | Native |
Published 2 June
21 views
ยท
4 applications
๐
Average salary range of similar jobs in
analytics โ
Loading...