MLOps Engineer
We are seeking an experienced MLOps Engineer with a strong background in building and deploying machine learning (ML) and large language models (LLMs). The ideal candidate will have mastery in data science platforms, significant software engineering experience, and expertise in LLMOps best practices. This role involves working on high-scale AI deployments, optimizing LLM pipelines, and implementing responsible AI techniques.
Key Responsibilities:
⦿ ML & LLM Model Development
o Build, deploy, and manage ML and LLM models on cloud-native platforms such as Microsoft Azure, AWS, or Google Cloud Platform (GCP).
o Utilize LLM-specific frameworks like LangChain, LangGraph, and LlamaIndex to develop sophisticated solutions, incorporating prompt engineering and dynamic response handling.
o Apply LLMOps best practices for model development, training, fine-tuning, and monitoring, ensuring scalability and high availability.
⦿ Software Engineering & Production Optimization
• Drive software engineering efforts focused on scaling LLM and ML models in low-latency, high-throughput production environments.
• Implement responsible AI practices, integrating LLM guardrails to maintain model reliability and ethical standards in production.
• Streamline response generation with data/LLM response streaming and parallelized workloads to enhance model performance.
⦿ Cloud-Native Computing & Deployment
• Design and maintain CI/CD pipelines for LLM deployment, leveraging DevOps principles and cloud-native technologies like Docker and Kubernetes.
• Establish and manage scalable cloud infrastructure, optimizing resource utilization for parallelized ML and LLM workloads.
⦿ Advanced LLM Operations & Data Engineering
• Implement advanced LLM operations, including Retrieval-Augmented Generation, multi-agent deployments (CrewAI, AutoGen), and vector databases for efficient context retrieval.
• Develop and maintain data engineering pipelines using technologies like Apache Spark and manage message queues (RabbitMQ, Kafka) to support real-time model integrations.
⦿ Database Management & Optimization
• Optimize and manage databases like Postgres, MongoDB, SQL Server, Redis, and vector databases to support ML and LLM model workloads and enhance retrieval efficiency.
• Ensure data integrity and model performance by fine-tuning database configurations for high-volume LLM deployments.
Required Qualifications:
Education
• Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related field.
Experience
• 8+ years of relevant experience, including mastery in data science platforms such as Microsoft Azure, AWS, or GCP for building and deploying ML and LLM models.
• Significant software engineering experience with a focus on ML/LLM model production, scalability in low-latency environments, and responsible AI implementation.
• Proven expertise in LLMOps practices and frameworks
Technical Skills
• Programming & Frameworks: Proficiency in object-oriented programming languages and LLM-specific frameworks with expertise in prompt engineering.
• Cloud & DevOps: Advanced understanding of Docker, Kubernetes, cloud native computing, and DevOps practices for ML and LLM deployments.
• Data & Vector Management: In-depth knowledge of vector databases, LLM fine-tuning techniques, and the implementation of LLM guardrails.
• Multi-Agent Systems: Experience with chatbot and multi-agent system deployments, including CrewAI, AutoGen, and LangGraph.
• Data Engineering & Messaging: Experience with Data Engineering, message queues (RabbitMQ, Kafka), and programming languages like Python, SQL, C++, R.
• Database Optimization: Proficiency in databases such as Postgres, MongoDB, SQL Server, Redis, and their optimization for ML/LLM workloads.
Key Competencies:
• Strong problem-solving skills with a proactive attitude towards continuous learning.
• Visionary mindset for shaping the future direction of LLMs and AI/ML at Scale.
• Effective communication skills for translating technical concepts to non-technical stakeholders.
• A collaborative mindset and an inclination to mentor and uplift the technical capabilities of the team.