Head of Reliability $$$$ Offline

RedCore Top Employer Responds Quickly

We are looking for a Head of Reliability to join our teams!

Skills and Experience:

- Experience working in regulated or high-availability industries (finance, iGaming, telco, SaaS).
- Proven track record in building and leading SRE and/or Incident Management teams. Experience in technical support, DevOps, or SRE-related roles.
- Proven experience leading SRE or reliability engineering functions in multi-product environments.
- Deep expertise in incident response, escalation management, and on-call operations.
- Experience with monitoring tools Grafana, Datadog, etc. Strong background in distributed systems, cloud platforms (AWS & GCP), Kubernetes, databases, and networking.
- Prior leadership of geographically distributed operations teams.

Nice to have: experience integrating AI-driven observability and automation into incident workflows.

Qualifications:

Education: Bachelor’s / Master’s degree in Computer Science, Software Engineering, or Management.
Experience: At least 10+ years of experience in SRE, infrastructure, or operations, with at least 5 years in leadership roles.

Soft Skills:

- Exceptional communication skills with ability to lead under pressure.
- Strong communication and mentoring skills; ability to lead, motivate, and mentor a team.
- Ability to communicate clearly with both technical and non-technical users.

Responsibilities:
1. Leadership & Strategy
Define and execute the combined SRE and Incident Management vision, strategy, and roadmap.
Build, mentor, and retain a high-performing global SRE team, with a culture of accountability and continuous improvement.
Establish and enforce service reliability goals (SLIs/SLOs/SLAs) across all product domains.
Act as the executive escalation point for high-severity incidents.
Provide transparent reporting to stakeholders on incident trends, reliability posture, and team performance.
Influence business continuity and disaster recovery strategies. Participate in post-mortems and recommend preventive measures.

2. Operational Excellence and Incident Management Ownership
Drive adoption of self-healing systems and proactive monitoring. Standardize observability practices across all products (metrics, logs, tracing).
Partner with product engineering, infrastructure, and security teams on resilience, compliance, and risk mitigation. Collaborate with Engineering to identify root causes.
Develop and enforce best practices in incident management, root cause analysis, and post-mortems. Identify recurring issues and suggest automation or reliability improvements.
Own service reliability goals (SLIs/SLOs/SLAs) across multiple domains. Own the full incident management lifecycle: detection, triage, escalation, resolution, communication, and closure.
Establish clear severity levels, escalation policies, and on-call rotations across teams. Ensure incidents are logged, tracked, and resolved according to defined SLAs.

3. Team Management and Development
Plan and allocate engineering resources effectively. Support the teams in their day-to-day work; ensure engineers have clear technical goals.
Guide professional development of engineers in their technical careers, and ensure the teams are empowered to deliver high-quality solutions and service.
Be responsible for educational support to cover knowledge gaps; bring training and education for teams where needed.
Foster a collaborative, inclusive, and high-performing team culture; manage conflicts within and among teams.

Our benefits to you:

☘️An exciting and challenging job in a fast-growing holding, the opportunity to be part of a multicultural team of top professionals in Development, Architecture, Management, Operations, Marketing, Legal, Finance and more
🤝🏻Great working atmosphere with passionate experts and leaders, sharing a friendly culture and a success-driven mindset is guaranteed
🧑🏻‍💻Modern corporate equipment based on macOS or Windows and additional equipment are provided
🏖Paid vacations, sick leave, personal events days, days off
💵Referral program — enjoy cooperation with your colleagues and get the bonus
📚Educational programs: regular internal training sessions, compensation for external education, attendance of specialized global conferences
🎯Rewards program for mentoring and coaching colleagues
🗣Free internal English courses
✈️In-house Travel Service
🦄Multiple internal activities: online platform for employees with quests, gamification, presents and news, PIN-UP clubs for movie / book / pets lovers and more
🎳Other benefits could be added based on your location

Required languages

English

B1 - Intermediate

The job ad is no longer active

Look at the current jobs DevOps →

Only from 10 years of experience
Full Remote
EU
Countries where we consider candidates
- English B1 - Intermediate

DevOps

Employment: Fulltime
Domain: Other
Product

Apply for the job

📊 $4000-6500 Average salary range of similar jobs in analytics →