AI Tech lead/ Senior developer (ML Ops Architect / ML Infrastructure Lead)

As AI Expert in the CTO’s office, you will be the technical owner of everything model-powered inside the company:

  • Architecture – Design the end-to-end pipeline that ingests org context, routes to the right expert model, executes code in sandboxed containers, and feeds rich telemetry back into our continuous-learning loop.
  • Model Strategy – Decide when we fine-tune open-source Llama-3 vs. hot-swap to Bedrock or Vertex; benchmark MoE routers for latency and cost; champion vLLM/Triton for GPU efficiency.
  • MLOps at Scale – Own versioning, lineage, policy gating and roll-back of models and in-line tools. Ship deterministic, reproducible releases that DevSecOps trusts.
  • Tooling & Integrations – Work with backend and platform leads to expose new model endpoints through our Model Context Protocol (MCP) so agents can compose actions across GitHub, Jira, Terraform, Prometheus and more — without one-off plugins.
  • Thought Leadership – Partner with the CTO on the technical roadmap, publish internal RFCs, mentor engineers and evangelize best practices across the company and open-source community.

 

What You’ll Do

  • Craft cloud-native, micro-service architectures for training, fine-tuning and real-time inference (AWS/GCP/Azure, Kubernetes, JetStream).
  • Define SLOs for p95 agent latency, model success rate, and telemetry coverage; instrument with OTEL, Prometheus and custom reward models.
  • Drive our continuous-learning loop: reward modelling, ContextGraph enrichment, auto-tuning MoE routers.
  •  Embed least-privilege IAM and OPA/ABAC policy checks into every stage of the model lifecycle.
  •  Collaborate with product managers to translate customer pain into roadmap items and with design partners to validate solutions in production.
  •  Mentor a cross-functional squad of backend engineers, ML engineers and data scientists.

 

What You'll Bring

  • 5+  years in software engineering, with 3+ years architecting large-scale backend systems (Python, Go, Java or similar).
  • 4+  years designing, deploying and monitoring AI/ML systems in production.
  • Deep expertise in at least one of: large-language-model serving, MoE routing, RLHF, vector search, streaming inference.
  • Hands-on fluency with Kubernetes, Docker, CI/CD, IaC (Terraform/Helm) and distributed data technologies (Kafka, Spark, Arrow).
  • Proven MLOps track record (MLflow, Kubeflow, SageMaker, or similar) and a security-first mindset.
  • Ability to turn ambiguous business goals into a crisp, scalable architecture — and to communicate that vision to both executives and engineers.
  • Great Englich communication skills.

 

Nice-to-Haves

  •  PhD or publications in ML/NLP/Systems.
  •  Contributions to open-source LLM or MLOps projects.
  •  Experience pushing real-time inference to the edge or FPGA/ASIC accelerators.
  •  Prior leadership of cross-functional AI/ML teams in a fast-growing startup environment.

     

The Way We Work

We value clarity, ownership, and velocity. You’ll have direct access to the CTO, autonomy to choose the right tech, and a front-row seat as we redefine how enterprises move “from prompt to production.”

If building the Kubernetes of AI-driven operations excites you, let’s talk.


 

Published 28 May
45 views
·
10 applications
100% read
·
90% responded
Last responded 3 days ago
To apply for this and other jobs on Djinni login or signup.
Loading...