-
Β· 108 views Β· 13 applications Β· 11d
Middle Site Reliability Engineer (SRE)
Full Remote Β· Worldwide Β· Product Β· 2 years of experience Β· English - B2Role Overview: We are assisting our partner iGaming company in expanding their engineering team responsible for ensuring the stability and predictable behaviour of their distributed services and platforms. The role involves working with production...Role Overview:
We are assisting our partner iGaming company in expanding their engineering team responsible for ensuring the stability and predictable behaviour of their distributed services and platforms. The role involves working with production infrastructure, analysing system behaviour, and implementing practices that improve reliability across multiple platforms.
This position is intended for engineers who clearly understand the difference between SRE and DevOps practices, and for whom SLOs, error budgets, and availability targets such as 99.85β99.95% are practical tools rather than abstract concepts.
The engineer will work as part of an SRE shift schedule covering late-evening and night hours (17:00β01:00 and 00:00β08:00 CET, in rotation) to ensure end-to-end ownership of incidents, from user impact to root cause and follow-up improvements.
Key Responsibilities:
β Contributing to architectural changes affecting the reliability and scalability of services and platforms;
β Operating and improving Kubernetes clusters (cluster model, networking, ingress, load balancing);
β Working with AWS-based environments (networking, storage, compute, managed services);
β Managing infrastructure using Terraform and configuration management with Ansible;
β Developing and refining monitoring and observability across platforms (Prometheus, Alertmanager, Grafana, and log aggregation such as ELK / Loki);
β Participating in incident handling: initial classification, technical investigation, coordination with product/engineering teams, and following-up improvements;
β Reducing operational toil and building tools that support reliability and efficiency (internal utilities, automation, CI/CD improvements);
β Collaborating with development teams to embed SRE practices into the lifecycle of services (SLIs/SLOs, error budgets, readiness for production).
Ideal profile for the position:
Core skills:
β Strong Linux skills in production environments (debugging, performance, system services);
β Solid understanding of networking (TCP/IP, DNS, HTTP, load balancing, TLS);
β Hands-on experience operating Kubernetes in production (not just local clusters);
β Experience with AWS cloud services (for example: EC2, ALB/NLB, RDS, S3, IAM, EKS or self-managed Kubernetes);
β Confident use of Terraform and Ansible in real environments (multi-environment IaC, reusable modules/roles);
β Experience with observability tools:
- metrics and alerting (Prometheus/Alertmanager or similar),
- dashboards (Grafana or similar),
- logging (ELK stack, Loki or comparable solutions).
β Ability to troubleshoot across application, network, and infrastructure layers, using scripting and tools (Python/Go/Bash, curl, tcpdump, log analysis, etc.);
β Experience with containers and image lifecycle (Docker or compatible runtimes).
Experience:
β Participation in production incidents and technical post-incident reviews (not just on-call escalation);β 2β5 years of practical experience in SRE, infrastructure, platform or production-focused DevOps engineering;
β Experience working within CI/CD pipelines (for example: Jenkins, GitLab CI, GitHub Actions, ArgoCD or similar);
β Exposure to environments with high availability requirements (e.g. low tolerance to downtime, strict SLAs/SLOs).
β Availability to work between 5 PM and 8 AM CET, in the following shifts: 17:00β01:00 and 00:00β08:00.
What will be an advantage:
β Experience with high-load or real-time systems (payments, finance, gaming, streaming);
β Experience with CDNs or real-time log aggregation/analytics;
β Familiarity with databases and message systems (for example: PostgreSQL, MySQL, MongoDB, Kafka, Redis, RabbitMQ);
β Experience with involving external integrations and third-party APIs (payment providers, KYC, risk/anti-fraud, content providers);
β Experience with service meshes, API gateways or ingress controllers (Istio, Linkerd, NGINX, Envoy, etc.).
Success Metrics:
β Maintain and improve SLOs for key services in the 99.85β99.95% availability range, with clear SLIs and error budgets;
β Keep unplanned downtime below 1% for critical user-facing functionality;
β Ensure that the majority of infrastructure and platform configuration (target β₯ 90β95%) is managed as code (Terraform, Ansible, Kubernetes manifests/Helm charts);
β Systematically reduce MTTR (Mean Time To Recovery) for incidents by improving detection, diagnostics and standard operating procedures;
β Prevent repeated high-severity incidents by driving post-incident reviews and concrete follow-up actions (configuration changes, automation, runbooks, architectural adjustments);
β Maintain up-to-date operational documentation and runbooks for core services, so that incidents can be handled consistently across the team.
The company guarantees you the following benefits:
β Global Collaboration: Join an international team where everyone treats each other with respect and moves towards the same goal;
β Autonomy and Responsibility: Enjoy the freedom and responsibility to make decisions without the need for constant supervision.
β Competitive Compensation: Receive competitive salaries reflective of your expertise and knowledge as our partner seeks top performers.
β Remote Work Opportunities: Embrace the flexibility of fully remote work, with the option to visit company offices that align with your current location.
β Paid Time Off: Prioritise work-life balance with paid vacation and sick leave days to prevent burnout;
β Career Development: Access continuous learning and career development opportunities to enhance your professional growth;
β Corporate Culture: Experience a vibrant corporate atmosphere with exciting parties and team-building events throughout the year;
β Referral Bonuses: Refer talented friends and receive a bonus after they successfully complete their probation period;
β Medical Insurance Support: Choose the right private medical insurance and receive compensation (full or partial) based on the cost;
β Flexible Benefits: Customise your compensation by selecting activities or expenses you'd like the company to cover, such as a gym subscription, language courses, Netflix subscription, spa days, and more;
β Education Foundation: Participate in a biannual raffle for a chance to learn something new unrelated to your job as part of your commitment to ongoing education.
Interview process:
- A 30-minute interview with a Recruiter to get to know you and your experience;
- 1st stage of technical interview (1 h) with the DevOps team to assess your theoretical skills;
- 2nd stage of technical interview (1 h) with the DevOps team to assess your hard skills;
- A final 1-hour interview to gauge your fit with the company culture and working style.
If you find this opportunity right for you, don't hesitate to apply or get in touch with us if you have any questions!
More -
Β· 64 views Β· 23 applications Β· 20d
DevOps Engineer (with Java)
Full Remote Β· Worldwide Β· Product Β· 3 years of experience Β· English - B2Games Inc. is looking for a skilled DevOps Engineer (with Java) to help build, scale, and maintain the backbone of its gaming platform. Youβll be hands-on in coding, problem-solving, and collaborating across teams to ensure the platform is robust,...Games Inc. is looking for a skilled DevOps Engineer (with Java) to help build, scale, and maintain the backbone of its gaming platform. Youβll be hands-on in coding, problem-solving, and collaborating across teams to ensure the platform is robust, reliable, and future-ready.
About the company:
Games Inc. is a long-established game studio with over 10 years of experience delivering high-quality real-money casino games. Renowned for visually striking and mathematically rich content, the studio collaborates with top-tier operators across regulated and unregulated markets. The diverse portfolio includes classic slots, bespoke table games, and next-generation crash games, crafted by a team of industry experts.
With a proprietary Remote Gaming Server (RGS) and full licensing, Games Inc. ensures full control, scalability, and compliance. Now expanding distribution, accelerating production, and investing in scalable tech, the studio is evolving into a globally recognised supplier.
What You Will Do:
As a DevOps Engineer (with Java) at Games Inc., you will sit at the intersection of software engineering and infrastructure. You will be a hands-on contributor to the core Java platform while simultaneously leading the charge on our Cloud infrastructure and DevOps maturity.
From player session management to automated deployment pipelines, you will help us build a βself-serviceβ platform foundation. You will work closely with the Tech Lead and Chief Architect not only to write clean, scalable Java code but also to drive architectural decisions regarding cloud-native adoption, CI/CD optimization, and infrastructure automation.
Summary of Responsibilities:
- Infrastructure & Architecture: Designing and implementing secure, scalable Cloud infrastructure using Terraform (IaC);
- Backend Development: Writing clean, tested, and high-performance Java code for core platform services and APIs;
- Helping migrate applications written in other languages to Java;
- DevOps Culture: Evangelizing DevOps practices within the team; becoming the go-to person for cloud knowledge sharing and helping other developers understand infrastructure;
- CI/CD Optimization: Taking ownership of the deployment pipelines (Jenkins/GitHub), ensuring fast, reliable, and automated releases;
- Reliability: Improving system observability (Prometheus/Grafana) and leading root-cause analysis for production issues;
- Modernization: Helping migrate legacy systems to containerized (Docker) and cloud-native solutions, reducing technical debt.
Tech Stack:
The team is modernising the platform, and youβll get to work with:
- Core Backend: Java (primary), with some exposure to Typescript, NodeJS, Python;
- Infrastructure as Code: Terraform (must have), Docker;
- Cloud Provider Experience: AWS (primary, with EC2, ALB, Route53, IAM, VPC, etc.), GCP;
- Observability & CI/CD: Jenkins, GitHub Actions, Prometheus, Grafana, CAdvisor;
- Databases: MongoDB (primary), Postgres;
- iGaming experience: obligatory.
What you'll need to have:
- 3+ years of experience in a hybrid Backend/DevOps role or Systems Engineering;
- Proficiency in Java: Comfortable writing complex backend logic and APIs;
- Deep cloud knowledge: You understand how to architect for cost, security, and high availability;
- Strong IaC experience: You treat Terraform configuration as production software;
- CI/CD Ownership: Experience setting up and maintaining pipelines (Jenkins/GitHub);
- Containerization: Experience with Docker and orchestration concepts;
- Collaborative mindset: You are eager to share your Cloud/DevOps knowledge with the wider engineering team.
What Makes This Opportunity Unique:
- A remote-first culture built on trust, flexibility, and respect;
- The chance to work on a platform powering globally distributed casino games;
- A leadership team that values technical input and innovation;
- Autonomy and ownership over meaningful parts of the platform;
- Competitive salary, holiday, and benefits (tailored depending on location).
If you find this opportunity right for you, don't hesitate to apply or get in touch with us if you have any questions!
More
NextChallenge connects exceptional talent with rewarding opportunities through personalised approaches and extensive IT industry expertise. We carefully identify professionals and align them with compatible organisations. NextChallenge excels in matching top talent with rewarding careers. Moreover, our HR consultancy refines practices, empowers teams, and fosters enduring success.
Website:
https://nextchallenge.com/