Senior Site Reliability Engineer (SRE) โ AWS and GCP
Client
Our client is revolutionizing the retail direct store delivery model by addressing key challenges like communication gaps, out-of-stocks, invoicing errors, and price inconsistencies. Through innovative technology and strong partnerships, they help boost sales, increase profits, and enhance customer loyalty.
Position overview
We are seeking a skilled Middle to Senior Site Reliability Engineer (SRE) with hands-on experience in both AWS and Google Cloud Platform (GCP) to join a fast-paced, innovative project team. This role requires proactive monitoring, automation, and optimization of cloud infrastructure to ensure high availability, scalability, and security of mission-critical retail solutions.
The candidate should be available for at least four hours of overlapping work time with the New York time zone to ensure smooth collaboration and participation in team activities.
Responsibilities
- Design, build, and operate scalable and reliable systems on AWS and GCP cloud platforms
- Develop and maintain automation scripts to improve deployment, monitoring, and incident response
- Ensure system availability, latency, and overall reliability to meet service level objectives (SLOs)
- Collaborate with development and operations teams to implement best practices for security, monitoring, and infrastructure management
- Proactively troubleshoot and resolve infrastructure incidents and performance bottlenecks
- Participate in on-call rotations and incident management processes
- Continuously improve system architecture and automation to reduce manual intervention and improve efficiency
- Support CI/CD pipelines and infrastructure as code (IaC) initiatives
Requirements
- 4+ years of experience in Site Reliability Engineering, DevOps, or Cloud Engineering roles
- Strong hands-on experience with AWS services (EC2, S3, VPC, Lambda, CloudWatch, IAM, etc.)
- Proven expertise with Google Cloud Platform (Compute Engine, GKE, Cloud Storage, IAM, Stackdriver, etc.)
- Skilled in scripting and automation tools (Python, Bash, Terraform, Ansible, or similar)
- Experience managing container orchestration platforms such as Kubernetes or GKE
- Familiarity with CI/CD tools such as Jenkins, GitLab CI, or CircleCI
- Solid understanding of networking, security best practices, and cloud infrastructure design
- Comfortable working in agile, collaborative team environments
- Excellent communication skills and ability to work with distributed teams
- Availability for a minimum of 4 hours overlap with New York time zone for meetings and collaboration
Required skills experience
| SRE | 4 years |
| AWS | 4 years |
| GCP (Google Cloud Platform) | 4 years |
Required languages
| English | B2 - Upper Intermediate |