Lead LLM Ops/Cloud Ops Engineer Offline
Key responsibilities:
As an LLMOps Engineer, you will be responsible for providing expertise on overseeing the complete lifecycle management of large language models LLM. This includes the development of strategies for deployment, continuous integration and delivery (CI/CD) processes, performance tuning, and ensuring high availability of our LLM services.
You will collaborate closely with data scientists, AI/ML engineers, and IT teams to define and align LLM operations with business goals, ensuring a seamless and efficient operating model.
In this role, you will:
- Define and disseminate LLMOps best practices.
- Evaluate and compare different LLMOps tools to incorporate the best practices.
- Stay updated on industry trends and advancements in LLM technologies and operational methodologies.
- Participate into architecture design/validation sessions for the Generative AI use cases with entities.
- Contribute to the development and expansion of GenAI use cases, including standard processes, framework, templates, libraries, and best practices around GenAI.
- Design, implement, and oversee the infrastructure required for the efficient operation of large language models in collaboration with client entities.
- Provide expertise and guidance to client entities in the development and scaling of GenAI use cases, including standard processes, framework, templates, libraries, and best practices around GenAI
- Serve as the expert and representative on LLMops Practices, including: (1) Developing and maintaining CI/CD pipelines for LLM deployment and updates, (2) Monitoring LLM performance, identifying and resolving bottlenecks, and implementing optimizations, (3) Ensuring the security of LLM operations through comprehensive risk assessments and the implementation of robust security measures.
- Collaborate with data and IT teams to facilitate data collection, preparation, and model training processes.
- Practical experience with training, tuning, utilizing LLMs/SLMs.
- Strong experience with GenAI/LLM frameworks and techniques, like guardrails, Langchain, etc.
- Knowledge of LLM security and observability principles.
- Experience of using Azure cloud services for ML
Tech stack required:
Programming languages: Python
Public Cloud: Azure
Frameworks: K8s, Terraform, Arize or any other ML/LLM observability tool
Experience: Experience with public services like Open AI, Anthropic and similar, experience deploying open source LLMs will be a plus
Tools: LangSmith/LangChain,guardrails
The job ad is no longer active
Look at the current jobs ML / AI →