Senior AI/ML Engineer (LLMs, AI evaluation) to $5000

DOIT Software Verified Employer

We’re looking for an Applied AI Engineer who combines strong ML fundamentals with the discipline of improving production AI systems through metrics, evaluation, and iteration.

This role is hands-on, product-focused, and collaborative with the AI platform lead.

The role in a nutshell

You’ll work on improving production AI systems through evaluation, experimentation, and system design.

A large part of the role involves:

  • diagnosing failures in agent workflows
  • designing evaluation metrics and KPIs
  • improving system prompts and agent behavior
  • running structured experiments and measuring impact

You won’t be working in isolation on research projects — you’ll be improving systems that real users depend on.

Rough responsibility breakdown:

  • AI evaluation and KPI design — ~30%
  • Prompt and agent system design — ~30%
  • ML systems (recommendation, optimization, etc.) — ~30%
  • Engineering integration — ~10%

What you’ll work on:AI evaluation and system quality

  • Design evaluation strategies for LLM and agent workflows
  • Create metrics and KPIs for AI system performance
  • Build and maintain evaluation datasets
  • Debug production AI failures systematically
  • Compare system behavior against baselines

This is a core responsibility of the role.

Multi-agent AI systems

  • Improve agent orchestration and workflows
  • Diagnose failures across agent pipelines
  • Refine system prompts and agent interactions
  • Improve reliability, latency, and response quality

ML and AI systems

You’ll contribute to areas such as:

  • Recommendation systems (ranking and personalization)
  • Itinerary optimization and constraint-based planning
  • LLM-based reasoning systems
  • Optional: computer vision pipelines

Depth in one of these areas is more important than superficial experience in all of them.

Engineering collaboration

We use:

  • Golang (primary production language)
  • Python when necessary for ML workflows
  • Postgres, Redis, and internal services

You don’t need to be a Go expert on day one, but you should be comfortable reading and modifying production code.

Backend engineers handle infrastructure-heavy service development — your focus is AI system behavior and correctness.

What we’re looking forMust-haves

Strong AI/ML fundamentals You understand the theory behind what you build and can choose appropriate methods for a problem.

Examples:

  • evaluation metrics (precision/recall/F1/etc.)
  • ranking and recommendation concepts
  • embeddings and similarity
  • experimentation methodology

Not required:

  • academic publications
  • advanced theoretical math
  • large-scale model training experience

Evaluation-driven mindset You:

  • think in metrics and baselines
  • design experiments instead of guessing
  • measure system improvements quantitatively
  • debug failures methodically

This is the most important signal for the role.

Experience with LLM systems You’ve worked with:

  • prompt design
  • agent workflows
  • evaluation of LLM outputs
  • production LLM integrations

Ability to ship production systems You can:

  • turn ideas into working systems
  • iterate based on results
  • balance exploration with delivery

Programming ability You’re comfortable writing production code in at least one language (Python, Go, or similar) and learning others when needed.

Strong signals (nice to have)

  • Experience improving an AI system after deployment
  • Recommendation systems or ranking experience
  • Optimization or constraint-based systems
  • Computer vision experience
  • Experience building evaluation frameworks
  • Golang experience
  • Startup or small-team engineering experience

This role may not be a fit if

  • You are looking for a research focused role without production deployment
  • You rely heavily on frameworks without understanding fundamentals
  • You’re uncomfortable working with partially-defined problems
  • You prefer narrow specialization over product ownership

Required skills experience

AI/ML 5 years

Required languages

English B2 - Upper Intermediate
Ukrainian Native
LLM, Multi-agent systems
Published 3 March
10 views
·
0 applications
To apply for this and other jobs on Djinni login or signup.
Loading...