Infrastructure Solutions Architect with AI
About the role:
The Infrastructure Architect will have a strong background in designing, implementing, and scaling enterprise infrastructure with a focus on AI workloads and platforms. This role will be instrumental in shaping the technical foundation required to support AI/ML models, data pipelines, and high-performance compute environments across our organization. The role will combine expertise in cloud, networking, and storage architectures with knowledge of AI infrastructure needs such as GPU clusters, model training/deployment environments, and MLOps frameworks.
Responsibilities:
Key Responsibilities | Architecture & Strategy
- Design and implement infrastructure architectures to support enterprise AI workloads (training, inference, and data processing)
- Define scalable strategies for on-prem, cloud, or hybrid environments optimized for AI/ML performance
- Develop roadmaps for AI infrastructure adoption and integration into existing IT landscapes
Key Responsibilities | Infrastructure Engineering
- Architect GPU/accelerator-based compute clusters and storage solutions optimized for large-scale AI workloads
- Collaborate with data scientists and ML engineers to understand infrastructure requirements for model training and deployment
- Ensure high availability, scalability, and cost-efficiency of AI workloads.
Key Responsibilities | Cloud & DevOps
- Design cloud-native solutions leveraging services like AWS Sagemaker, Azure ML, or GCP Vertex AI
- Establish MLOps pipelines and CI/CD frameworks for AI/ML (gitlab ci/cd,etc.)
- Automate provisioning, monitoring, and scaling of AI infrastructure
Key Responsibilities | Governance and Security
- Define best practices for data governance, compliance, and security in AI systems
- Ensure responsible usage of AI infrastructure with strong observability and governance controls
- Optimize resource utilization and manage budgets for high-performance compute environments
Requirements:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
- 5+ years of experience in infrastructure architecture, cloud solutions, or systems engineering
- Proven experience with AI/ML infrastructure (GPU clusters, distributed training, containerization, Kubernetes, etc.)
- Strong knowledge of cloud platforms (AWS, Azure, GCP) and their AI/ML services
- Experience with MLOps tools (Kubeflow, MLflow, Airflow, etc.)
- Solid understanding of networking, storage, and security principles
- Ability to communicate complex technical concepts to both technical and non-technical stakeholders
- Strong development / TL experience, preferably with python
Nice to Have
- Experience with HPC (High-Performance Computing) or large-scale distributed systems
- Hands-on experience with deep learning frameworks (PyTorch)
- Knowledge of data platforms (Hadoop, etc.)
- Familiarity with emerging generative AI infrastructure technologies (LLM hosting, vector databases, retrieval-augmented generation
Required languages
English | B2 - Upper Intermediate |