Prompt Spec Engineer

$$$

Product

We are looking for a Prompt Spec Engineer who will define how our AI systems communicate, reason, and deliver high-quality outputs across products powered by LLMs.

You will be responsible for designing prompt/spec architectures, building evaluation systems, and ensuring that AI responses are consistent, reliable, and production-ready across RAG systems, AI agents, and automation workflows.

What you will do:

Design and evolve prompt/spec architecture for AI products, LLM services, RAG systems, and AI agents, defining prompt structures, model behavior rules, response formats, and quality criteria.
Develop, optimize, and adapt prompts for different LLMs and use cases, considering output quality, context limitations, cost, and system stability.
Build and maintain AI evaluation processes including test scenarios, golden datasets, regression testing, and A/B experiments for LLM performance measurement.
Analyze AI outputs, identify root causes of issues (hallucinations, missing context, retrieval problems, incorrect or unsafe responses), and define improvements.
Collaborate with AI Engineers and Product Managers to implement prompt solutions into production systems, RAG pipelines, AI agents, and automation workflows, translating business needs into technical specifications.
Work with LLM evaluation and monitoring tools (such as PromptFoo, Langfuse, and similar solutions) to track metrics, prompt versions, response quality, and system performance.
Create and maintain structured documentation: prompt specs, evaluation frameworks, usage guidelines, limitations, and best practices for safe and reliable AI behavior.

What we expect from you:

2+ years of experience in LLM-related roles such as Prompt Engineering, AI Product Operations, AI Evaluation, or similar.
Hands-on experience designing and optimizing prompts for different LLMs with a strong understanding of model behavior, strengths, and limitations.
Experience building prompt/spec systems including prompt structure design, behavior rules, response formatting, testing, and versioning.
Strong experience in AI evaluation: test case design, output analysis, issue detection, and iterative improvement of model performance.
Understanding of RAG systems, AI agents, structured outputs, tool calling, and principles of reliable LLM system design.
Ability to collaborate with engineering and product teams, translate business requirements into AI specifications, and document solutions clearly and consistently.
Upper-Intermediate (B2+) English

What we offer

Competitive compensation depending on experience
Work on production-level AI systems (LLMs, RAG, AI agents, automation workflows)
Office-based work in Lviv with the possibility of a hybrid schedule.
Direct impact on AI product quality and system behavior
Access to modern LLM tools and evaluation frameworks
Professional growth in a fast-moving AI-focused environment
Opportunity to shape prompt engineering standards inside the company
Relocation support for candidates from other cities, including assistance with moving and adaptation in Lviv.

Required languages

English B2 - Upper Intermediate

Ukrainian Native

Published 11 June · Updated 13 July

103 views

10 applications

Response activity: Medium

Last responded 5 days ago

See stats of candidates who applied for this job 👀

See applicant insights

To apply for this and other jobs on Djinni login or signup.

Only from 1.5 years of experience
Office Work
Ukraine
Countries where we consider candidates
- English B2 - Upper Intermediate
- Ukrainian Native

ML / AI

Employment: Fulltime
Domain: SaaS
Product
Office: Ukraine (Lviv)

Response activity: Medium

Last responded 5 days ago

📊 Average salary range of similar jobs in analytics →