Sysadmin / SRE (L2 Support) โ Crypto Trading Platform
We are looking for a Sysadmin / Site Reliability Engineer (L2 Support) to join a distributed operations team supporting a large-scale, high-load crypto trading platform. This is a hands-on operational role with deep involvement in incident management, platform reliability, and infrastructure observability, working closely with DevOps and DevSecOps teams. You will be responsible for keeping critical trading infrastructure stable, observable, and resilient across Azure cloud and on-prem environments, operating at enterprise scale (~1M users).
Details
Location: Fully remote
Employment Type: Full-time
Start Date: ASAP
Language: English โ Upper-Intermediate or higher
Domain: Crypto / Trading / FinTech
Key Responsibilities
- Act as L2 operations support for production infrastructure with strict uptime and performance requirements
- Manage and resolve infrastructure incidents, participate in root cause analysis (RCA), and drive permanent fixes
- Own and maintain CMDB, ensuring accuracy and consistency of infrastructure assets
- Proactively monitor platform health: capacity, availability, performance, and security
- Operate and continuously improve the monitoring and observability stack
- Collaborate closely with DevOps and DevSecOps teams on reliability, automation, and security initiatives
- Participate in disaster recovery planning, testing, and execution
- Support release activities and operational readiness for new services and infrastructure changes
- Ensure proper documentation of incidents, procedures, and operational playbooks
Requirements
- Proven experience in Sysadmin, SRE, or L2 Operations roles in production environments
- Strong hands-on experience with Azure infrastructure and hybrid (cloud + on-prem) setups
- Solid understanding of high-availability and large-scale systems
- Practical experience with monitoring and observability tools: Grafana, Prometheus, Azure Monitor & Application Insights, New Relic, Loki
- Experience with incident management, on-call rotations, and post-incident analysis
- Understanding of capacity planning, reliability engineering, and infrastructure security basics
- Ability to work calmly under pressure in mission-critical environments
Nice to Have
- Experience in crypto, trading, fintech, or other real-time systems
- Familiarity with SRE best practices and error budgets
- Exposure to DevSecOps processes
- Experience improving observability, alert quality, and noise reduction
- Scripting or automation skills (Bash, Python, PowerShell, etc.)
Why Join
- Work on a high-impact crypto trading platform with real technical challenges
- Fully remote, distributed engineering culture
- Direct influence on platform stability and reliability at scale
- Close collaboration with strong DevOps and security teams
Required languages
| English | C1 - Advanced |
| Ukrainian | Native |