Manual and Automation QA Engineer (LLM Conversational Systems)

We’re looking for a QA Manual & Automation Engineer to ensure the quality, correctness, and reliability of our LLM-based conversational product—from UI flows and data connectors to reasoning and response validation. You’ll design test strategies that catch regressions in conversation behavior, verify data-grounded answers against datasets, and build automation to scale coverage.

This role blends manual QA, automation engineering, and AI conversational validation.

Responsibilities

Own end-to-end QA for an LLM-driven conversational system: functional, regression, and exploratory testing.
Build and maintain automated test suites in Python, with a strong focus on Playwright for UI and workflow automation.
Design and execute AI conversational tests:
- Create structured test prompts and multi-turn scenarios
- Run queries over datasets and validate correctness of responses against expected ground truth
- Detect and document hallucinations, inconsistencies, and reasoning failures
Develop test scenarios to validate data correctness and reasoning validity, including edge cases and adversarial prompts (within product scope).
Write and validate SQL queries to confirm the correctness of underlying data and outputs.
Validate integrations and storage across MongoDB, MySQL, and SQLite.
Define and improve QA processes: test plans, defect triage, release sign-off criteria, and reporting dashboards.
Collaborate closely with Product, Engineering, and ML teams to improve testability, reliability, and release confidence.

Requirements (Must-have)

Strong Python programming skills for test automation and tooling.
Hands-on experience with Playwright for automated testing.
Experience testing AI conversational systems, including:
- Designing prompt suites over datasets
- Verifying correctness/consistency of responses
- Validating multi-turn behavior and context retention
Strong manual QA ability for LLM-based conversational flows (UX + functional correctness).
Ability to write and validate SQL queries for verification and debugging.
Experience with databases: MongoDB, MySQL, SQLite.
Strong test scenario design skills, especially for data correctness and reasoning validation.

Nice-to-have

Experience with Elasticsearch (queries, relevance checks, troubleshooting).
Familiarity with Cursor or similar AI coding assistants to speed up test development and debugging.

Required languages

English

B2 - Upper Intermediate

Published 4 March

227 views

51 applications

Response activity: Low

Last responded 3 weeks ago

To apply for this and other jobs on Djinni login or signup.

Only from 3 years of experience
Full Remote
Worldwide
Countries where we consider candidates
- English B2 - Upper Intermediate

QA Automation

Employment: Fulltime
Domain: Other
Outstaff
Test task is needed

Apply for the job

Response activity: Low

Last responded 3 weeks ago

📊 $2000-3200 Average salary range of similar jobs in analytics →