Senior AI/ML Engineer (LLMs, AI evaluation) to $5000

DOIT Software Verified Employer

We’re looking for an Applied AI Engineer who combines strong ML fundamentals with the discipline of improving production AI systems through metrics, evaluation, and iteration.

This role is hands-on, product-focused, and collaborative with the AI platform lead.

The role in a nutshell

You’ll work on improving production AI systems through evaluation, experimentation, and system design.

A large part of the role involves:

diagnosing failures in agent workflows
designing evaluation metrics and KPIs
improving system prompts and agent behavior
running structured experiments and measuring impact

You won’t be working in isolation on research projects — you’ll be improving systems that real users depend on.

Rough responsibility breakdown:

AI evaluation and KPI design — ~30%
Prompt and agent system design — ~30%
ML systems (recommendation, optimization, etc.) — ~30%
Engineering integration — ~10%

What you’ll work on:AI evaluation and system quality

Design evaluation strategies for LLM and agent workflows
Create metrics and KPIs for AI system performance
Build and maintain evaluation datasets
Debug production AI failures systematically
Compare system behavior against baselines

This is a core responsibility of the role.

Multi-agent AI systems

Improve agent orchestration and workflows
Diagnose failures across agent pipelines
Refine system prompts and agent interactions
Improve reliability, latency, and response quality

ML and AI systems

You’ll contribute to areas such as:

Recommendation systems (ranking and personalization)
Itinerary optimization and constraint-based planning
LLM-based reasoning systems
Optional: computer vision pipelines

Depth in one of these areas is more important than superficial experience in all of them.

Engineering collaboration

We use:

Golang (primary production language)
Python when necessary for ML workflows
Postgres, Redis, and internal services

You don’t need to be a Go expert on day one, but you should be comfortable reading and modifying production code.

Backend engineers handle infrastructure-heavy service development — your focus is AI system behavior and correctness.

What we’re looking forMust-haves

Strong AI/ML fundamentals You understand the theory behind what you build and can choose appropriate methods for a problem.

Examples:

evaluation metrics (precision/recall/F1/etc.)
ranking and recommendation concepts
embeddings and similarity
experimentation methodology

Not required:

academic publications
advanced theoretical math
large-scale model training experience

Evaluation-driven mindset You:

think in metrics and baselines
design experiments instead of guessing
measure system improvements quantitatively
debug failures methodically

This is the most important signal for the role.

Experience with LLM systems You’ve worked with:

prompt design
agent workflows
evaluation of LLM outputs
production LLM integrations

Ability to ship production systems You can:

turn ideas into working systems
iterate based on results
balance exploration with delivery

Programming ability You’re comfortable writing production code in at least one language (Python, Go, or similar) and learning others when needed.

Strong signals (nice to have)

Experience improving an AI system after deployment
Recommendation systems or ranking experience
Optimization or constraint-based systems
Computer vision experience
Experience building evaluation frameworks
Golang experience
Startup or small-team engineering experience

This role may not be a fit if

You are looking for a research focused role without production deployment
You rely heavily on frameworks without understanding fundamentals
You’re uncomfortable working with partially-defined problems
You prefer narrow specialization over product ownership

Required skills experience

AI/ML

5 years

Required languages

English	B2 - Upper Intermediate
Ukrainian	Native

LLM, Multi-agent systems

Published 3 March

10 views

0 applications

To apply for this and other jobs on Djinni login or signup.

Only from 5 years of experience
$3000-5000
Full Remote
Ukraine
Countries where we consider candidates
- English B2 - Upper Intermediate
- Ukrainian Native

ML / AI

AI/ML

5 years

Employment: Fulltime
Domain: Travel / Tourism
Outstaff

Apply for the job

📊 $4000-6500 Average salary range of similar jobs in analytics →