Senior Site Reliability Engineer (SRE)
About Corvex
Corvex delivers unparalleled cloud-based AI infrastructure, featuring cutting-edge NVIDIA GPUs that combine exceptional reliability, security, performance, and value. We're building a world-class experience for developers and data scientists across enterprise and AI-native organizations-empowering professionals to focus exclusively on training, fine-tuning, and inference of their AI models, while we manage the nuts and bolts of our premium infrastructure.
Company and Position Description
Corvex is seeking a Senior Site Reliability Engineer to help design, build, and operate our next-generation AI infrastructure platform. You will work across infrastructure-as-code, automation, Kubernetes, and private cloud environments. This role requires strong technical judgment, the ability to troubleshoot complex distributed systems, and the written communication skills necessary for clear, professional client interaction.
This position requires 5-6 hours of overlap with US Eastern Time and participation in a rotating on-call schedule (1 week every 6 weeks), which should be considered in compensation expectations.
What You’ll Do
- Lead the design, deployment, and maintenance of infrastructure using Terraform and Ansible
- Build, operate, and optimize Kubernetes clusters
- Troubleshoot production issues across systems, networks, and platforms
- Work with private cloud platforms (OpenStack strongly preferred)
- Drive automation and improvements to CI/CD pipelines and operational tooling
- Collaborate with engineering teams on reliability, scalability, and architectural decisions
- Generate clear, high-quality technical documentation and client-facing communication
- Participate in on-call rotation and refine incident response processes
What We’re Looking For
- 7+ years of experience in SRE, DevOps, or Systems Engineering roles
- Strong experience with Terraform, Ansible, and Kubernetes
- Excellent troubleshooting skills across Linux, networking, and distributed systems
- Experience with OpenStack or other private cloud environments (commercial or non-commercial)
- Excellent written English; able to communicate professionally with clients
- Ability to work effectively with US-East time zone overlap
- Willingness to participate in on-call rotation (1 week every 6 weeks)
What We Offer
- Competitive salary
- A chance to help define a new category of AI infrastructure
- Greenfield architecture - build the product you’ve always wanted to use
- High trust and autonomy, with deep impact on platform direction
- Remote-first culture with the option to collaborate in person as we scale
- Small, highly skilled team and zero bureaucracy.
Required skills experience
| SRE | 7 years |
| DevOps | 7 years |
| Terraform | 5 years |
| Ansible | 5 years |
| Kubernetes | 5 years |
| OpenStack | 1 year |
Required languages
| English | B2 - Upper Intermediate |