Senior MLOps Engineer

US Company is searching for a couple of Senior MLOps Engineers into a complex, enterprise-level project for maintaining of Multi AI Agentic platform. Interesting project with cutting edge technologies, distributed team, full-time, an official contract. Remote work, flexible schedule.

 

Brief project description:

  • The product is SaaS, which allows multiple functions for sustainability & operational risk management. The project is dedicated for developing and maintaining of AI Center of Excellence.

 

Key areas of responsibility:

  • Support application infrastructure to ensure platform is optimized for performance and reliability.
  • Build and maintain supporting infrastructure for cloud environments and applications.
  • Lead the design and engineering of software systems for the AI/ML Platform, contributing to the full ML development life cycle
  • Identify and implement opportunities to automate and streamline ML development processes, fostering efficiency and effectiveness
  • Building and Managing enterprise AI solutions using Azure AI Foundry & Azure's integrated platform (Azure OpenAI, AI Search, Prompt Flow),
  • Use a wide range of Azure resources to implement scalable, reliable and cost-efficient services.
  • Work side by side with your engineers, guiding critical projects, using your subject matter expertise to solve complex problems.
  • Perform and automate system administration services including installation, configuration, maintenance, and disaster recovery
  • Identify system level issues related to OS configuration and virtual hardware bottlenecks.
  • Automation, auditing, and other tooling for security, compliance, and resource usage - Monitor and improve processes for all deployments.
  • Take ownership of operational excellence to establish and maintain business focused KPI’s. Use these metrics not only as a basis for positive transformation, but also as a visible representation of your team’s success.
  • Maintain and take ownership of your tasks each sprint ensuring time worked is logged and details are added.

 

Required skills (at least one or more for each category):

  • Excellent communication and collaboration skills.
  • Ability to communicate effectively with multiple disciplines.
  • Strong analytical skills with the ability to drive reliability improvements.
  • Proven experience designing, implementing, and managing infrastructure for AI/ML or HPC workloads.
  • Proven success managing and optimizing CI/CD for an Azure-based SaaS application.
  • Hands-on-experience with common DevOps tools, including Azure DevOps.
  • Strong scripting skills with at least PowerShell or Bash.
  • Experience with configuration management tools like Terraform.
  • Working knowledge of version control/source code management tools.
  • Experience working in an agile software development life cycle.
  • Working knowledge of databases including SQL and NoSQL
  • Strong knowledge of web server configuration and management like IIS or Tomcat.
  • Working knowledge of local and wide area networking and associated technologies, e.g. switches, routers, firewalls, VPNs.
  • Strong understanding of security principles.
  • Bachelor’s Degree in Computer Science or similar area of study.
  • Experience working in a technical fast paced environment with high demand and high standards.
  • Ability to assimilate information quickly under pressure.
  • Strong planning skills to ensure business expectations are met and projects are delivered on time.

 

Tech Stack:

  • Understanding machine learning frameworks and libraries such as TensorFlow, PyTorch, or sci-kit-learn and their deployment in production environments is a plus.
  • Cloud-based networking (VNETs, NSGs, Route Tables, Hub/Spoke based architecture, Private Endpoints, etc)
  • Strong understanding of CI/CD (YAML and Classic Pipelines)
  • PaaS (Web Apps, Function Apps, Storage Accounts, Key Vaults, Azure Redis Cache, Azure Service Bus, API Management, Azure Data Factory, Azure Databricks)
  • Monitoring (New Relic, LogEntries, Rapid7, Azure Monitoring, etc)
  • Some experience with Containers/Docker either for a microservice based architecture or Service Orientated etc
  • Git, ARM, Terraform, Ansible, Azure CLI or similar.
  • JavaScript, React, C#, Java, PowerShell, Bash or similar.

 

Work conditions:

  • Distributed team, remote work.
  • Kanban or scrum approach, 5-6 team members / team.
  • Full-time (40 hours per week).
  • Official contract: salary, sick-leave days, holidays, vacations.

 

Hiring process:

  • Step 1 - preliminary interview (main questions) - 30 mins
  • Step 2 - internal tech interview (tech questions) - 40-50 mins
  • Step 3 - tech interview with team leader and architect - 1 hour

Required languages

English B2 - Upper Intermediate
DevOps, Azure DevOps, MLOps
Published 3 February
36 views
·
13 applications
100% read
·
86% responded
Last responded yesterday
To apply for this and other jobs on Djinni login or signup.
Loading...