Prompt Spec Engineer
$$$
Product
We are looking for a Prompt Spec Engineer who will define how our AI systems communicate, reason, and deliver high-quality outputs across products powered by LLMs.
You will be responsible for designing prompt/spec architectures, building evaluation systems, and ensuring that AI responses are consistent, reliable, and production-ready across RAG systems, AI agents, and automation workflows.
What you will do:
- Design and evolve prompt/spec architecture for AI products, LLM services, RAG systems, and AI agents, defining prompt structures, model behavior rules, response formats, and quality criteria.
- Develop, optimize, and adapt prompts for different LLMs and use cases, considering output quality, context limitations, cost, and system stability.
- Build and maintain AI evaluation processes including test scenarios, golden datasets, regression testing, and A/B experiments for LLM performance measurement.
- Analyze AI outputs, identify root causes of issues (hallucinations, missing context, retrieval problems, incorrect or unsafe responses), and define improvements.
- Collaborate with AI Engineers and Product Managers to implement prompt solutions into production systems, RAG pipelines, AI agents, and automation workflows, translating business needs into technical specifications.
- Work with LLM evaluation and monitoring tools (such as PromptFoo, Langfuse, and similar solutions) to track metrics, prompt versions, response quality, and system performance.
- Create and maintain structured documentation: prompt specs, evaluation frameworks, usage guidelines, limitations, and best practices for safe and reliable AI behavior.
What we expect from you:
- 2+ years of experience in LLM-related roles such as Prompt Engineering, AI Product Operations, AI Evaluation, or similar.
- Hands-on experience designing and optimizing prompts for different LLMs with a strong understanding of model behavior, strengths, and limitations.
- Experience building prompt/spec systems including prompt structure design, behavior rules, response formatting, testing, and versioning.
- Strong experience in AI evaluation: test case design, output analysis, issue detection, and iterative improvement of model performance.
- Understanding of RAG systems, AI agents, structured outputs, tool calling, and principles of reliable LLM system design.
- Ability to collaborate with engineering and product teams, translate business requirements into AI specifications, and document solutions clearly and consistently.
- Upper-Intermediate (B2+) English
What we offer
- Competitive compensation depending on experience
- Work on production-level AI systems (LLMs, RAG, AI agents, automation workflows)
- Office-based work in Lviv with the possibility of a hybrid schedule.
- Direct impact on AI product quality and system behavior
- Access to modern LLM tools and evaluation frameworks
- Professional growth in a fast-moving AI-focused environment
- Opportunity to shape prompt engineering standards inside the company
- Relocation support for candidates from other cities, including assistance with moving and adaptation in Lviv.
Required languages
| English | B2 - Upper Intermediate |
| Ukrainian | Native |
Published 11 June
14 views
ยท
2 applications
๐
Average salary range of similar jobs in
analytics โ
Loading...