Senior AI Engineer
Role in one line
Build the agent runtime โ the brains of the platform: stateful LangGraph workflows, RAG, tool integrations, model routing, human-in-the-loop gates, and audit-grade logging โ in Python.
Context
We are building a multi-agent AI platform for a regulated banking client. The agent runtime is a Python / FastAPIservice exposing OpenAI-compatible endpoints, with agents implemented as LangGraph graphs and tools exposed over MCP. Sensitive workloads (KYC, PII) run on self-hosted open-weight models on EU infrastructure (Hetzner); less sensitive workloads route to frontier models via AWS Bedrock EU. The AI engineer owns the agent logic, retrieval, and the model / tool layer.
What you will work on
- Design and build stateful agent graphs in LangGraph โ multi-stage workflows, tool calling, and interrupt-based human-in-the-loop gates (e.g. the six-stage KYC assistant).
- Build RAG pipelines: ingestion, chunking, embeddings, retrieval, and grounded Q&A with source citations (e.g. meeting analysis, search lens).
- Implement structured-output patterns where the LLM emits validated JSON only and deterministic engines do the rest โ keeping the model out of the render path for document generation.
- Integrate tools over MCP (streamable-http) and wire agents to bank data sources behind the tool layer.
- Work across hosted and self-hosted models โ a per-task model router selecting between Bedrock EU and on-prem open-weight models โ with prompt design, evaluation, and cost / latency awareness.
- Build audit-grade logging and provenance for every AI operation, with PII handled by hashing rather than plaintext.
Must-have
- Strong Python: production-grade, typed, tested โ not notebook-only.
- LangChain and LangGraph: hands-on building stateful graphs, tool / function calling, and human-in-the-loop interrupts.
- LLM application patterns: RAG, prompt design, structured output / JSON-schema validation, and evaluation of agent behaviour.
- Retrieval stack: vector stores (pgvector and / or Qdrant) and embeddings.
- Serving: FastAPI for agent endpoints; comfort with async Python.
- Model access: working with hosted APIs (AWS Bedrock) and self-hosted open-weight models (Llama, Mistral, Qwen).
Nice-to-have
- Banking / regulated experience; awareness of audit logging, PII handling, and EU data residency.
- MCP (Model Context Protocol) tool integration.
- ASR / speech-to-text pipelines (for meeting analysis).
- MLOps for self-hosted inference (vLLM, Ollama, or similar) on EU infrastructure (Hetzner).
- Workflow orchestration (e.g. Camunda) and deterministic document rendering (docxtpl, openpyxl).
Tech stack you will touch
Python, LangChain, LangGraph, FastAPI ยท pgvector / Qdrant, embeddings ยท AWS Bedrock EU and self-hosted open-weight LLMs on Hetzner ยท MCP servers (streamable-http) ยท Docker, Git, CI/CD.
Ways of working
- Remote, distributed delivery team; English working language; scrum-light cadence.
- Banking-grade rigor: every AI operation logged and auditable, human-in-the-loop by design, compliance built into the architecture โ not bolted on.
- Agents prepare, retrieve, draft, and propose; the trigger and the final action always stay with a human.
Important
As this is a Germany-based project, we are primarily seeking candidates based in Western Ukraine, with Vinnytsia and Lviv being our preferred locations.
Required languages
| English | B2 - Upper Intermediate |
| Ukrainian | Native |