ML Evaluation Engineer (LLM / Data Quality) Offline

ITG

$$$$

About the Role

We’re looking for a Machine Learning Engineer focused on evaluation and data quality to help improve the performance of modern ML and LLM systems.

This role is centered around understanding model behavior, identifying failure patterns, and building strong evaluation and feedback loops. You won’t be training models directly — instead, you’ll play a critical role in improving model outcomes through better data, analysis, and experimentation.

If you enjoy analytical problem-solving, working with real-world data, and making ML systems measurably better — this role is for you.

What You’ll Do

Analyze and improve datasets used in ML and LLM workflows
Perform detailed error analysis on model outputs (qualitative and quantitative)
Design and implement evaluation frameworks, benchmarks, and quality checks
Identify failure modes, data gaps, and improvement opportunities
Support model experimentation with structured insights and feedback
Collaborate closely with ML engineers and researchers to improve system performance
Document evaluation methodologies, findings, and recommendations
Help maintain high standards for data quality, consistency, and bias awareness

What We’re Looking For

5+ years of experience in Machine Learning or Applied ML roles
Strong understanding of ML fundamentals and evaluation methodologies
Hands-on experience working with real-world datasets for ML or LLM systems
Strong Python skills
Experience with data validation, evaluation pipelines, and error analysis
Strong analytical thinking and attention to detail
Ability to work independently and clearly communicate insights

Nice to Have

Experience with LLMs, NLP, or generative AI systems
Background in data annotation, QA, or evaluation-heavy ML environments
Experience with experiment tracking, prompt evaluation, or benchmark design
Exposure to bias analysis, robustness testing, or dataset auditing
Research or competition-based ML background

Why This Role

Work on cutting-edge ML and LLM systems
Direct impact on model quality and real-world performance
High ownership in shaping evaluation and data practices
Collaborative, fast-moving engineering environment

Required languages

English C1 - Advanced

The job ad is no longer active

Look at the current jobs ML / AI →

Only from 5 years of experience
Full Remote
Countries of Europe or Ukraine
Countries where we consider candidates
- English C1 - Advanced

ML / AI

Employment: Fulltime
Domain: Machine Learning / Big Data
Outsource

📊 Average salary range of similar jobs in analytics →