Infrastructure Solutions Architect with AI

About the role:

The Infrastructure Architect will have a strong background in designing, implementing, and scaling enterprise infrastructure with a focus on AI workloads and platforms. This role will be instrumental in shaping the technical foundation required to support AI/ML models, data pipelines, and high-performance compute environments across our organization. The role will combine expertise in cloud, networking, and storage architectures with knowledge of AI infrastructure needs such as GPU clusters, model training/deployment environments, and MLOps frameworks.

 

Responsibilities:

Key Responsibilities | Architecture & Strategy

  • Design and implement infrastructure architectures to support enterprise AI workloads (training, inference, and data processing)
  • Define scalable strategies for on-prem, cloud, or hybrid environments optimized for AI/ML performance
  • Develop roadmaps for AI infrastructure adoption and integration into existing IT landscapes

Key Responsibilities | Infrastructure Engineering

  • Architect GPU/accelerator-based compute clusters and storage solutions optimized for large-scale AI workloads
  • Collaborate with data scientists and ML engineers to understand infrastructure requirements for model training and deployment
  • Ensure high availability, scalability, and cost-efficiency of AI workloads.

Key Responsibilities | Cloud & DevOps

  • Design cloud-native solutions leveraging services like AWS Sagemaker, Azure ML, or GCP Vertex AI
  • Establish MLOps pipelines and CI/CD frameworks for AI/ML (gitlab ci/cd,etc.)
  • Automate provisioning, monitoring, and scaling of AI infrastructure

Key Responsibilities | Governance and Security

  • Define best practices for data governance, compliance, and security in AI systems
  • Ensure responsible usage of AI infrastructure with strong observability and governance controls
  • Optimize resource utilization and manage budgets for high-performance compute environments

 

Requirements:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
  • 5+ years of experience in infrastructure architecture, cloud solutions, or systems engineering
  • Proven experience with AI/ML infrastructure (GPU clusters, distributed training, containerization, Kubernetes, etc.)
  • Strong knowledge of cloud platforms (AWS, Azure, GCP) and their AI/ML services
  • Experience with MLOps tools (Kubeflow, MLflow, Airflow, etc.)
  • Solid understanding of networking, storage, and security principles
  • Ability to communicate complex technical concepts to both technical and non-technical stakeholders
  • Strong development / TL experience, preferably with python

 

Nice to Have

  • Experience with HPC (High-Performance Computing) or large-scale distributed systems
  • Hands-on experience with deep learning frameworks (PyTorch)
  • Knowledge of data platforms (Hadoop, etc.)
  • Familiarity with emerging generative AI infrastructure technologies (LLM hosting, vector databases, retrieval-augmented generation

Required languages

English B2 - Upper Intermediate
Published 15 September
30 views
·
7 applications
43% read
·
0% responded
To apply for this and other jobs on Djinni login or signup.
Loading...