AI Evaluation/ QA Engineer $$ Offline
Hey-hey!
We are looking for an AI Evaluation & QA Engineer with 2–4 years of experience to join our product and engineering team. You will be responsible for testing, validating, and improving Dash’s retrieval-augmented generation (RAG) engine and AI agent systems. This role is hands-on and technically focused, ideal for someone with strong Python skills, AI evaluation experience, and a keen attention to detail.
Key Responsibilities
- Design and implement automated and manual QA pipelines for RAG engine components and AI agent behavior.
- Develop metrics and benchmarks for evaluating retrieval accuracy, generation relevance, and agent reliability, including faithfulness, context precision, retrieval recall, factual consistency, answer completeness, and alignment with user intent.
- Utilize AI evaluation frameworks such as Ragas, LangSmith, Weights & Biases, or similar tools to measure and improve RAG pipeline performance.
- Build Python scripts and utilities to test, monitor, and analyze retrieval and response quality at scale.
- Collaborate with AI engineers and data scientists to identify edge cases, hallucinations, and inconsistencies in retrieval and generation flows.
- Create synthetic and real-world test datasets for evaluating document retrieval, context injection, and agent reasoning quality.
- Develop dashboards and reports for tracking performance and monitoring regressions.
- Contribute to continuous improvement of QA processes, retrieval versioning, and test automation.
Demonstrate adaptability and initiative in a startup environment, contributing to product success beyond defined responsibilities.
Qualifications
- 2–4 years of professional experience in software QA, AI/ML evaluation, or intelligent system testing.
- Proficiency in Python and scripting for automation and data analysis.
- Hands-on experience with AI evaluation tools (e.g., Ragas, LangChain evaluation tools, Weights & Biases, PromptLayer).
- Solid understanding of retrieval-augmented generation (RAG) systems, vector databases, and embedding evaluation.
- Experience with modern MLOps tools, CI/CD systems, and version control (GitHub, Jenkins, etc.).
- Strong analytical and problem-solving skills, with a data-driven approach to quality assurance.
- Experience in user feedback loop integration and reinforcement learning from human feedback (RLHF).
Ability to collaborate effectively with AI engineers, data scientists, and product managers.
Nice to Have
- Familiarity with retrieval pipelines, document ranking, and semantic search systems.
- Exposure to real estate or SaaS platforms.
Interested? Apply now!
Required languages
| Ukrainian | Native |
The job ad is no longer active
Look at the current jobs QA Manual →