Senior MLOps Engineer
US Company is searching for a couple of Senior MLOps Engineers into a complex, enterprise-level project for maintaining of Multi AI Agentic platform. Interesting project with cutting edge technologies, distributed team, full-time, an official contract. Remote work, flexible schedule.
Brief project description:
- The product is SaaS, which allows multiple functions for sustainability & operational risk management. The project is dedicated for developing and maintaining of AI Center of Excellence.
Key areas of responsibility:
- Support application infrastructure to ensure platform is optimized for performance and reliability.
- Build and maintain supporting infrastructure for cloud environments and applications.
- Lead the design and engineering of software systems for the AI/ML Platform, contributing to the full ML development life cycle
- Identify and implement opportunities to automate and streamline ML development processes, fostering efficiency and effectiveness
- Building and Managing enterprise AI solutions using Azure AI Foundry & Azure's integrated platform (Azure OpenAI, AI Search, Prompt Flow),
- Use a wide range of Azure resources to implement scalable, reliable and cost-efficient services.
- Work side by side with your engineers, guiding critical projects, using your subject matter expertise to solve complex problems.
- Perform and automate system administration services including installation, configuration, maintenance, and disaster recovery
- Identify system level issues related to OS configuration and virtual hardware bottlenecks.
- Automation, auditing, and other tooling for security, compliance, and resource usage - Monitor and improve processes for all deployments.
- Take ownership of operational excellence to establish and maintain business focused KPI’s. Use these metrics not only as a basis for positive transformation, but also as a visible representation of your team’s success.
- Maintain and take ownership of your tasks each sprint ensuring time worked is logged and details are added.
Required skills (at least one or more for each category):
- Excellent communication and collaboration skills.
- Ability to communicate effectively with multiple disciplines.
- Strong analytical skills with the ability to drive reliability improvements.
- Proven experience designing, implementing, and managing infrastructure for AI/ML or HPC workloads.
- Proven success managing and optimizing CI/CD for an Azure-based SaaS application.
- Hands-on-experience with common DevOps tools, including Azure DevOps.
- Strong scripting skills with at least PowerShell or Bash.
- Experience with configuration management tools like Terraform.
- Working knowledge of version control/source code management tools.
- Experience working in an agile software development life cycle.
- Working knowledge of databases including SQL and NoSQL
- Strong knowledge of web server configuration and management like IIS or Tomcat.
- Working knowledge of local and wide area networking and associated technologies, e.g. switches, routers, firewalls, VPNs.
- Strong understanding of security principles.
- Bachelor’s Degree in Computer Science or similar area of study.
- Experience working in a technical fast paced environment with high demand and high standards.
- Ability to assimilate information quickly under pressure.
- Strong planning skills to ensure business expectations are met and projects are delivered on time.
Tech Stack:
- Understanding machine learning frameworks and libraries such as TensorFlow, PyTorch, or sci-kit-learn and their deployment in production environments is a plus.
- Cloud-based networking (VNETs, NSGs, Route Tables, Hub/Spoke based architecture, Private Endpoints, etc)
- Strong understanding of CI/CD (YAML and Classic Pipelines)
- PaaS (Web Apps, Function Apps, Storage Accounts, Key Vaults, Azure Redis Cache, Azure Service Bus, API Management, Azure Data Factory, Azure Databricks)
- Monitoring (New Relic, LogEntries, Rapid7, Azure Monitoring, etc)
- Some experience with Containers/Docker either for a microservice based architecture or Service Orientated etc
- Git, ARM, Terraform, Ansible, Azure CLI or similar.
- JavaScript, React, C#, Java, PowerShell, Bash or similar.
Work conditions:
- Distributed team, remote work.
- Kanban or scrum approach, 5-6 team members / team.
- Full-time (40 hours per week).
- Official contract: salary, sick-leave days, holidays, vacations.
Hiring process:
- Step 1 - preliminary interview (main questions) - 30 mins
- Step 2 - internal tech interview (tech questions) - 40-50 mins
- Step 3 - tech interview with team leader and architect - 1 hour
Required languages
| English | B2 - Upper Intermediate |
DevOps, Azure DevOps, MLOps
📊
$4000-7000
Average salary range of similar jobs in
analytics →
Loading...