Commit Offshore

Manual and Automation QA Engineer (LLM Conversational Systems)

We’re looking for a QA Manual & Automation Engineer to ensure the quality, correctness, and reliability of our LLM-based conversational product—from UI flows and data connectors to reasoning and response validation. You’ll design test strategies that catch regressions in conversation behavior, verify data-grounded answers against datasets, and build automation to scale coverage.

 

This role blends manual QA, automation engineering, and AI conversational validation.

 

Responsibilities

  • Own end-to-end QA for an LLM-driven conversational system: functional, regression, and exploratory testing.
  • Build and maintain automated test suites in Python, with a strong focus on Playwright for UI and workflow automation.
  • Design and execute AI conversational tests:
    • Create structured test prompts and multi-turn scenarios
    • Run queries over datasets and validate correctness of responses against expected ground truth
    • Detect and document hallucinations, inconsistencies, and reasoning failures
  • Develop test scenarios to validate data correctness and reasoning validity, including edge cases and adversarial prompts (within product scope).
  • Write and validate SQL queries to confirm the correctness of underlying data and outputs.
  • Validate integrations and storage across MongoDB, MySQL, and SQLite.
  • Define and improve QA processes: test plans, defect triage, release sign-off criteria, and reporting dashboards.
  • Collaborate closely with Product, Engineering, and ML teams to improve testability, reliability, and release confidence.

     

Requirements (Must-have)

  • Strong Python programming skills for test automation and tooling.
  • Hands-on experience with Playwright for automated testing.
  • Experience testing AI conversational systems, including:
    • Designing prompt suites over datasets
    • Verifying correctness/consistency of responses
    • Validating multi-turn behavior and context retention
  • Strong manual QA ability for LLM-based conversational flows (UX + functional correctness).
  • Ability to write and validate SQL queries for verification and debugging.
  • Experience with databases: MongoDB, MySQL, SQLite.
  • Strong test scenario design skills, especially for data correctness and reasoning validation.

     

Nice-to-have

  • Experience with Elasticsearch (queries, relevance checks, troubleshooting).
  • Familiarity with Cursor or similar AI coding assistants to speed up test development and debugging.

Required languages

English B2 - Upper Intermediate
Published 4 March
227 views
·
51 applications
Response activity: Low
Last responded 3 weeks ago
To apply for this and other jobs on Djinni login or signup.
Loading...