Senior AI/ML Engineer (Applied AI, Evaluation-Driven)
Who:
We’re looking for a Senior AI/ML Engineer who combines strong machine learning fundamentals with a disciplined, evaluation-first approach to improving production AI systems.
What:
You’ll improve real AI systems used by real users — through metrics, experimentation, system design, and iteration.
When:
Start ASAP.
Where:
Remote (any location).
Type:
Full-time.
English:
Upper-intermediate or Advanced required.
The Role in a Nutshell
This is not a pure research position.
You’ll work on improving production AI systems through structured evaluation, diagnosing failures, refining prompts and agent workflows, and designing measurable improvements.
You won’t be working in isolation — you’ll improve systems that users depend on daily.
Responsibility Breakdown (Approximate)
- AI evaluation & KPI design — ~30%
- Prompt and multi-agent system design — ~30%
- ML systems (recommendation, optimization, etc.) — ~30%
- Engineering integration — ~10%
What You’ll Work On
AI Evaluation & System Quality (Core Responsibility)
- Design evaluation strategies for LLM and agent workflows
- Define metrics and KPIs for AI system performance
- Build and maintain evaluation datasets
- Systematically debug production AI failures
- Compare system behavior against baselines
- Run structured experiments and measure real impact
This is a central part of the role.
Multi-Agent AI Systems
- Improve agent orchestration and workflows
- Diagnose failures across agent pipelines
- Refine system prompts and agent interactions
- Improve reliability, latency, and output quality
Applied ML Areas (Depth > Breadth)
You’ll contribute to one or more of the following areas:
- Recommendation systems (ranking & personalization)
- Itinerary optimization & constraint-based planning
- LLM-based reasoning systems
- (Optional) Computer vision pipelines
Depth in at least one of these areas is more important than shallow experience in many.
Engineering Environment
We use:
- Golang (primary production language)
- Python for ML workflows
- Postgres, Redis, internal services
You don’t need to be a Go expert on day one, but you must be comfortable reading and modifying production code.
Backend engineers own infrastructure-heavy services — your focus is on AI system behavior, correctness, and measurable improvement.
What We’re Looking For
1️⃣ Strong AI/ML Fundamentals (Must-Have)
You understand the theory behind what you build and can choose appropriate methods.
Examples:
- Evaluation metrics (precision, recall, F1, etc.)
- Ranking & recommendation concepts
- Embeddings and similarity
- Experimentation methodology
Not required:
- Academic publications
- Advanced theoretical math
- Large-scale model training experience
2️⃣ Evaluation-Driven Mindset (Most Important Signal)
You:
- Think in metrics and baselines
- Design experiments instead of guessing
- Measure improvements quantitatively
- Debug failures methodically
This is the most important quality for this role.
3️⃣ Experience with LLM Systems
You’ve worked with:
- Prompt design
- Agent workflows
- Evaluating LLM outputs
- Production LLM integrations
4️⃣ Ability to Ship Production Systems
You can:
- Turn ideas into working systems
- Iterate based on results
- Balance exploration with delivery
5️⃣ Programming Ability
You’re comfortable writing production code in at least one language (Python, Go, or similar) and learning others as needed.
Strong Signals (Nice to Have)
- Experience improving AI systems post-deployment
- Recommendation or ranking system experience
- Optimization / constraint-based systems
- Computer vision experience
- Experience building evaluation frameworks
- Golang experience
- Startup or small-team engineering background
Ideal Candidate
- 5+ years of experience in AI/ML (exceptionally strong 3+ year candidates considered)
- Strong English communication skills
- Comfortable working in a fast-moving environment
- Focused on measurable impact, not just models
Required languages
| English | B2 - Upper Intermediate |
| Ukrainian | Native |