AI Tech lead/ Senior developer (ML Ops Architect / ML Infrastructure Lead)

Talentuch

As AI Expert in the CTO’s office, you will be the technical owner of everything model-powered inside the company:

Architecture – Design the end-to-end pipeline that ingests org context, routes to the right expert model, executes code in sandboxed containers, and feeds rich telemetry back into our continuous-learning loop.
Model Strategy – Decide when we fine-tune open-source Llama-3 vs. hot-swap to Bedrock or Vertex; benchmark MoE routers for latency and cost; champion vLLM/Triton for GPU efficiency.
MLOps at Scale – Own versioning, lineage, policy gating and roll-back of models and in-line tools. Ship deterministic, reproducible releases that DevSecOps trusts.
Tooling & Integrations – Work with backend and platform leads to expose new model endpoints through our Model Context Protocol (MCP) so agents can compose actions across GitHub, Jira, Terraform, Prometheus and more — without one-off plugins.
Thought Leadership – Partner with the CTO on the technical roadmap, publish internal RFCs, mentor engineers and evangelize best practices across the company and open-source community.

What You’ll Do

Craft cloud-native, micro-service architectures for training, fine-tuning and real-time inference (AWS/GCP/Azure, Kubernetes, JetStream).
Define SLOs for p95 agent latency, model success rate, and telemetry coverage; instrument with OTEL, Prometheus and custom reward models.
Drive our continuous-learning loop: reward modelling, ContextGraph enrichment, auto-tuning MoE routers.
Embed least-privilege IAM and OPA/ABAC policy checks into every stage of the model lifecycle.
Collaborate with product managers to translate customer pain into roadmap items and with design partners to validate solutions in production.
Mentor a cross-functional squad of backend engineers, ML engineers and data scientists.

What You'll Bring

5+ years in software engineering, with 3+ years architecting large-scale backend systems (Python, Go, Java or similar).
4+ years designing, deploying and monitoring AI/ML systems in production.
Deep expertise in at least one of: large-language-model serving, MoE routing, RLHF, vector search, streaming inference.
Hands-on fluency with Kubernetes, Docker, CI/CD, IaC (Terraform/Helm) and distributed data technologies (Kafka, Spark, Arrow).
Proven MLOps track record (MLflow, Kubeflow, SageMaker, or similar) and a security-first mindset.
Ability to turn ambiguous business goals into a crisp, scalable architecture — and to communicate that vision to both executives and engineers.
Great Englich communication skills.

Nice-to-Haves

PhD or publications in ML/NLP/Systems.
Contributions to open-source LLM or MLOps projects.
Experience pushing real-time inference to the edge or FPGA/ASIC accelerators.
Prior leadership of cross-functional AI/ML teams in a fast-growing startup environment.

The Way We Work

We value clarity, ownership, and velocity. You’ll have direct access to the CTO, autonomy to choose the right tech, and a front-row seat as we redefine how enterprises move “from prompt to production.”

If building the Kubernetes of AI-driven operations excites you, let’s talk.

Published 28 May

45 views

10 applications

100% read

90% responded

Last responded 3 days ago

To apply for this and other jobs on Djinni login or signup.

Only from Advanced/Fluent
from 5 years of experience

Considering with 4 years of experience
Full Remote
Countries of Europe or Ukraine
Countries where we consider candidates

ML / AI
Python, Go, Java, AI/ML, MoE routing, RLHF, vector search, streaming inference, Docker, CI/CD

Domain: Other
Outstaff

Apply for the job

📊 Average salary range of similar jobs in analytics →