Network Ops Engineer - High-Performance Computing and Cloud Infrastructure

Position Overview:
 

We are looking for a Network Engineer to design, optimize, ensuring a reliable, secure, and scalable high-performance network infrastructure. The ideal candidate will prioritize system reliability, proactively implementing best practices to minimize downtime and ensure consistent performance for mission-critical workloads. A strong commitment to tenant isolation and network security is essential, ensuring secure, multi-tenant environments for our customers.


As a Network Engineer, you will design, deploy, and maintain high-performance network infrastructure for AI/ML applications, data centers, and private GPU clouds. You will work with cutting-edge networking technologies, including RDMA, InfiniBand, RoCE, and cloud-based networking solutions, to ensure low-latency, high-throughput communication across our distributed systems. 

 

Key Responsibilities:

 

  • Design, deploy, and maintain high-speed networking infrastructure for private and cloud-based AI/ML workloads. 
  • Configure and optimize RDMA/InfiniBand and RoCE networks for high-performance computing (HPC) clusters.
    Manage networking for NVIDIA GPU cloud environments, ensuring low-latency data transfers. 
  • Implement and manage network security policies, fi rewalls, and VPNs. 
  • Automate network management tasks using Python, Ansible, or other scripting tools. 
  • Monitor, troubleshoot, and optimize network performance using tools like Prometheus, Grafana, and Wireshark. 
  • Collaborate with DevOps, AI, and cloud infrastructure teams to ensure seamless connectivity across distributed environments. 
  • Design and manage software-defined networking (SDN) solutions for private and hybrid cloud deployments. 
  • Ensure compliance with networking best practices, security policies, and industry regulations. 

 

Required Skills & Qualifications: 

 

  • Education: Computer Science, Networking, or a related fi eld (or equivalent experience). 
  • Networking Protocols: Deep understanding of TCP/IP, UDP, MLAG, GRE, BGP, VLANs, VXLANs, and MPLS. 
  • High-Performance Networking: Experience with RDMA, Infiniband, Bare Metal, Virtualized Partitioning, and NVIDIA/Mellanox Cumulus networking solutions. 
  • Network Security: Proficiency in firewalls (pfSense), IDS/IPS, and Wireguard VPN technologies. 
  • Cloud Networking: Experience with virtual networking, and cloud networking solutions. 
  • Automation & Scripting: Knowledge of scripting languages, Ansible, Terraform, for network automation. 
  • Monitoring & Troubleshooting: Familiarity with Wireshark, tcpdump, Prometheus, Grafana, and SNMP-based monitoring. 
  • Infrastructure as Code (IaC): Experience with automating network configurations using scripting tools leveraging Terraform and Ansible. Preferred Qualifications: 
  • Experience in designing network architectures for AI/ML and high-performance computing (HPC) environments. 
  • Knowledge of data center networking with large-scale deployments. 
  • Experience with zero-trust security frameworks. Soft Skills: 
  • Ability to adapt and thrive in an ambiguous, fast-moving startup environment. 
  • Strong problem-solving skills and the ability to diagnose complex networking issues. 
  • A strong sense of ownership and accountability, with a commitment to end-to-end problem-solving working with external customers.
    Excellent communication skills, with the ability to work across DevOps, AI, and cloud engineering teams. 
  • Detail-oriented mindset, ensuring high availability and reliability of network infrastructure.
     

Benefits:

 

  • Flexible hybrid work arrangements.
  • Performance-based bonuses.
  • Access to cutting-edge AI/ML infrastructure.
162 views
·
17 applications
100% read
·
100% responded
Last responded 5 days ago
51 views
·
4 applications
100% read
·
100% responded
Last responded 5 days ago
To apply for this and other jobs on Djinni login or signup.

Similar jobs

Countries of Europe or Ukraine
Ukraine to $7000