Senior Network Data Center Engineer at Corvex
About Corvex
Corvex delivers unparalleled cloud-based AI infrastructure, featuring cutting-edge NVIDIA GPUs that combine exceptional reliability, security, performance, and value. We're ready to build a world-class experience for developers and data scientists across enterprise and AI-native organizations that will enable professionals to focus exclusively on training, fine-tuning, and inference of their AI models, while we manage the nuts and bolts of our premium infrastructure.
Position Overview:
We are looking for an engineer to design, optimize, ensure a reliable and scalable high-performance network infrastructure. The ideal candidate will prioritize system reliability, proactively implementing best practices to minimize downtime and ensure consistent performance for mission-critical workloads.
This person will design, deploy, and maintain high-performance network infrastructure for AI/ML applications, data centers, and private GPU clouds. You will work with cutting-edge networking technologies, including RDMA, InfiniBand, and cloud-based networking solutions, to ensure low-latency, high-throughput communication across our distributed systems.
This role also includes being responsible for the deployment, maintenance, and operation of data center infrastructure. This includes managing servers, networking hardware, and structured cabling. The role ensures optimal uptime and performance of the data center environment while supporting new infrastructure builds and ongoing capacity planning.
Key Responsibilities:
- Design, deploy, maintain and tune high-speed networking infrastructure for private and cloud-based AI/ML workloads.
- Configure and optimize RDMA/InfiniBand networks for high-performance computing (HPC) clusters.
- Manage and tune networking for NVIDIA GPU cloud environments, ensuring low-latency data transfers.
- Implement, manage, and tune network security policies, firewalls, and VPNs.
- Automate network management tasks using Python, Ansible, or other scripting tools.
- Monitor, troubleshoot, and optimize network performance using tools like Prometheus, Grafana, and Wireshark.
- Collaborate with DevOps, AI, and cloud infrastructure teams to ensure seamless connectivity across distributed environments.
- Design, manage, and tune software-defined networking (SDN) solutions for private and hybrid cloud deployments.
- Ensure compliance with networking best practices, security policies, and industry regulations.
- Data center trips to help Install, configure, and maintain server and network equipment.
Required Skills & Qualifications:
- Education: Computer Science, Networking, or a related field (or equivalent experience).
- Networking Protocols: Deep understanding of TCP/IP, BGP, VLANs, VXLANs, and MPLS.
- High-Performance Networking: Experience with RDMA, InfiniBand/Pkey, and NVIDIA/Mellanox Cumulus networking solutions.
- Edge Networking: Experience supporting Juniper routers
- Network Security: Proficiency in firewalls (pfSense), IDS/IPS, and Wireguard VPN technologies.
- Cloud Networking: Experience with virtual networking, and cloud networking solutions.
- Automation & Scripting: Knowledge of Python, Ansible, Terraform, and Bash for network automation.
- Monitoring & Troubleshooting: Familiarity with Wireshark, tcpdump, Prometheus, Grafana, and SNMP-based monitoring.
- Infrastructure as Code (IaC): Experience with automating network configurations using scripting tools leveraging Terraform and Ansible.
- Data Center: Experience in data center operations or IT infrastructure. Familiarity with rack-and-stack procedures, fiber/copper cabling, and network troubleshooting.
Preferred Qualifications:
- Experience in designing network architectures for AI/ML and high-performance computing (HPC) environments.
- Knowledge of data center networking with large-scale deployments.
- Familiarity with private networks and edge computing.
- Experience with zero-trust security frameworks.
Soft Skills:
- Ability to adapt and thrive in an ambiguous, fast-moving startup environment.
- Strong problem-solving skills and the ability to diagnose complex networking issues.
- Excellent communication skills, with the ability to work across DevOps, AI, and cloud engineering teams.
- Detail-oriented mindset, ensuring high availability and reliability of network infrastructure.
Benefits:
- Flexible hybrid work arrangements.
- Performance-based bonuses.
- Access to cutting-edge AI/ML infrastructure.
Required languages
English | C1 - Advanced |