AI Evaluation/ QA Engineer $$ Offline

Unilime Verified Employer

Hey-hey! 

 

We are looking for an AI Evaluation & QA Engineer with 2–4 years of experience to join our product and engineering team. You will be responsible for testing, validating, and improving Dash’s retrieval-augmented generation (RAG) engine and AI agent systems. This role is hands-on and technically focused, ideal for someone with strong Python skills, AI evaluation experience, and a keen attention to detail.

 

Key Responsibilities

  • Design and implement automated and manual QA pipelines for RAG engine components and AI agent behavior.
  • Develop metrics and benchmarks for evaluating retrieval accuracy, generation relevance, and agent reliability, including faithfulness, context precision, retrieval recall, factual consistency, answer completeness, and alignment with user intent.
  • Utilize AI evaluation frameworks such as Ragas, LangSmith, Weights & Biases, or similar tools to measure and improve RAG pipeline performance.
  • Build Python scripts and utilities to test, monitor, and analyze retrieval and response quality at scale.
  • Collaborate with AI engineers and data scientists to identify edge cases, hallucinations, and inconsistencies in retrieval and generation flows.
  • Create synthetic and real-world test datasets for evaluating document retrieval, context injection, and agent reasoning quality.
  • Develop dashboards and reports for tracking performance and monitoring regressions.
  • Contribute to continuous improvement of QA processes, retrieval versioning, and test automation.
  • Demonstrate adaptability and initiative in a startup environment, contributing to product success beyond defined responsibilities.

     

Qualifications

  • 2–4 years of professional experience in software QA, AI/ML evaluation, or intelligent system testing.
  • Proficiency in Python and scripting for automation and data analysis.
  • Hands-on experience with AI evaluation tools (e.g., Ragas, LangChain evaluation tools, Weights & Biases, PromptLayer).
  • Solid understanding of retrieval-augmented generation (RAG) systems, vector databases, and embedding evaluation.
  • Experience with modern MLOps tools, CI/CD systems, and version control (GitHub, Jenkins, etc.).
  • Strong analytical and problem-solving skills, with a data-driven approach to quality assurance.
  • Experience in user feedback loop integration and reinforcement learning from human feedback (RLHF).
  • Ability to collaborate effectively with AI engineers, data scientists, and product managers.

     

Nice to Have

  • Familiarity with retrieval pipelines, document ranking, and semantic search systems.
  • Exposure to real estate or SaaS platforms.

 

Interested? Apply now! 

Required languages

Ukrainian Native

The job ad is no longer active

Look at the current jobs QA Manual →

Loading...