AI Evaluation/ QA Engineer $$ Offline

Unilime Verified Employer

Hey-hey!

We are looking for an AI Evaluation & QA Engineer with 2–4 years of experience to join our product and engineering team. You will be responsible for testing, validating, and improving Dash’s retrieval-augmented generation (RAG) engine and AI agent systems. This role is hands-on and technically focused, ideal for someone with strong Python skills, AI evaluation experience, and a keen attention to detail.

Key Responsibilities

Design and implement automated and manual QA pipelines for RAG engine components and AI agent behavior.
Develop metrics and benchmarks for evaluating retrieval accuracy, generation relevance, and agent reliability, including faithfulness, context precision, retrieval recall, factual consistency, answer completeness, and alignment with user intent.
Utilize AI evaluation frameworks such as Ragas, LangSmith, Weights & Biases, or similar tools to measure and improve RAG pipeline performance.
Build Python scripts and utilities to test, monitor, and analyze retrieval and response quality at scale.
Collaborate with AI engineers and data scientists to identify edge cases, hallucinations, and inconsistencies in retrieval and generation flows.
Create synthetic and real-world test datasets for evaluating document retrieval, context injection, and agent reasoning quality.
Develop dashboards and reports for tracking performance and monitoring regressions.
Contribute to continuous improvement of QA processes, retrieval versioning, and test automation.
Demonstrate adaptability and initiative in a startup environment, contributing to product success beyond defined responsibilities.

Qualifications

2–4 years of professional experience in software QA, AI/ML evaluation, or intelligent system testing.
Proficiency in Python and scripting for automation and data analysis.
Hands-on experience with AI evaluation tools (e.g., Ragas, LangChain evaluation tools, Weights & Biases, PromptLayer).
Solid understanding of retrieval-augmented generation (RAG) systems, vector databases, and embedding evaluation.
Experience with modern MLOps tools, CI/CD systems, and version control (GitHub, Jenkins, etc.).
Strong analytical and problem-solving skills, with a data-driven approach to quality assurance.
Experience in user feedback loop integration and reinforcement learning from human feedback (RLHF).
Ability to collaborate effectively with AI engineers, data scientists, and product managers.

Nice to Have

Familiarity with retrieval pipelines, document ranking, and semantic search systems.
Exposure to real estate or SaaS platforms.

Interested? Apply now!

Required languages

Ukrainian

Native

The job ad is no longer active

Look at the current jobs QA Manual →

Only from 3 years of experience
Full Remote
Countries of Europe or Ukraine
Countries where we consider candidates
- Ukrainian Native

QA Manual

Employment: Fulltime
Domain: Fintech
Outsource
Test task is needed

Apply for the job

📊 $1800-2500 Average salary range of similar jobs in analytics →