Agentic AI Process Engineer $$$

This is not a coding role. This is a systems-of-systems role.

If you build things with AI, this posting isn't for you.

If you build the systems that govern how AI builds things — keep reading.

What This Role Is

LifeChef is a HIPAA-compliant healthcare platform delivering Medically Tailored Meals. Our engineering team builds on a methodology called Enso — a spec-driven, AI-augmented development framework where every feature starts with a specification, gets built with AI assistance, and is validated against defined criteria.

Enso v1 is linear: spec → generate → validate → ship. It works. But it's hitting scaling limits — the document graph grows with every feature, validation complexity compounds, and the process requires increasing human oversight to maintain quality. This is a familiar problem in any AI-assisted workflow that outgrows its initial architecture.

We need someone to take Enso from v1 to v2: a true agentic system where specialized AI agents handle research, specification validation, code generation oversight, quality assurance, and deviation detection — with structured autonomy, clear escalation paths, and human-in-the-loop at defined decision points, not everywhere.

You would partner directly with the CTO (who designed the current methodology and has strong opinions about where it needs to go) to architect, build, and maintain this system. The first 30 days are heavy collaboration — translating vision into working architecture. After that, you own the evolution of the process while the CTO focuses on product and team.

What You'd Actually Build and Maintain

Agent topology for the development lifecycle: research agents, spec validation agents, code review agents, compliance checking agents, red-team/adversarial agents — each with defined boundaries, confidence thresholds, and escalation triggers
Spec-to-code reconciliation system: agents that validate whether implemented code actually satisfies the specification, flagging deviations and ambiguities rather than silently passing
Multi-model orchestration strategy: routing different tasks to different models based on cost, capability, and risk (expensive models for reasoning, cheap models for classification, open-source where it fits)
Structured output validation: ensuring agent outputs are parseable, auditable, and actionable — not just "LLM said it's fine"
Governance and escalation framework: what agents decide autonomously vs. what requires human sign-off, circuit breakers for low-confidence situations, audit logging for compliance
Evolution engine: the system should improve itself — tracking where agents fail, where human overrides happen, and feeding that back into prompt refinement and architecture changes
Integration with existing CI/CD: the agentic layer hooks into GitHub Actions, automated testing (Vitest), PHI scanning, secret detection, and compliance validation already in the pipeline

This is infrastructure that the development team operates within. You're not shipping product features — you're shipping the machine that makes product development faster, safer, and more consistent.

What We Need From You

Hard requirements:

You have designed or built at least one agentic system where AI agents operate with structured autonomy — multi-step workflows with validation, error handling, and escalation. Not just API wrappers. Not just chatbots.
You understand agent topology as a design problem: which agents need to exist, what their boundaries are, how they communicate, how you prevent cascading failures.
You have hands-on experience with LLM orchestration frameworks (LangChain, CrewAI, AutoGen, custom implementations — the specific tool matters less than the depth of understanding).
You think in terms of eval frameworks and validation: how do you know if an agent's output is good? How do you measure drift? How do you build confidence scoring that's actually useful?
You understand multi-model strategies: when to use which model, cost/capability tradeoffs, how to tier your model usage so you're not burning expensive tokens on classification tasks.
You can work in both Python and TypeScript, or at minimum are comfortable in one and willing to operate in the other. Our product stack is TypeScript-first but agentic tooling often lives in Python.
Strong written communication. This role involves heavy async documentation of architectural decisions, process design, and system behavior.

What sets you apart:

Experience with Model Context Protocol (MCP) — building custom MCP servers, tool-use architectures
Claude Code, Cursor, or similar AI-native development tools as part of your own workflow (you should be using what you're building systems around)
Experience in healthcare, fintech, or regulated environments where AI outputs must be auditable and compliant
Understanding of prompt engineering at the systems level — not "write a good prompt" but "design a prompt architecture that's maintainable, versionable, and testable"
You've thought seriously about adversarial validation — red-team agents, cross-model verification, structured disagreement as a quality signal
You have opinions about where agentic AI is overhyped and where it genuinely works, and you can defend those opinions with specifics

What we explicitly don't require:

A portfolio of 50 agentic projects. The field is young. One deep project with real orchestration depth beats ten shallow ones.
Product development skills. You won't be building UI or shipping user-facing features.
Healthcare domain expertise. We'll teach you the domain. You teach us the orchestration.

How We Evaluate

Screening questions (in your application) — We're looking for specificity and honesty, not keyword density.
Architecture conversation with CTO (90 min) — We'll describe our current system, its limitations, and where we want to go. You'll tell us how you'd approach it. This is a two-way design conversation, not a quiz.
Small design exercise (async, take-home) — Given a simplified version of our spec validation problem, design an agent topology. We're evaluating how you think about boundaries, failure modes, and escalation — not production-ready code.
Trial sprint (60–90 days) — Heavy pairing with CTO in the first 30 days, increasing autonomy after. Defined milestones at 30, 60, and 90 days.

The Team and Context

Small, senior engineering team, primarily Ukraine-based. The CTO (Jay) designed the Enso methodology, has 20+ years of engineering experience across healthcare, threat intelligence, and enterprise platforms, and has built the governance architecture (constitutions, spec-gates, validation layers) that you'd be evolving. He's technical, direct, and has strong opinions — but he's hiring you because he needs a partner in this, not an executor.

The product stack is Next.js 16, React 19, TypeScript, PostgreSQL, AWS serverless (Lambda, CDK, Amplify Gen 2), with Vitest testing, Pino structured logging, and HIPAA compliance enforced in CI. You don't need to know this stack deeply, but you need to understand it well enough to build agentic tooling that integrates with it.

We're building toward autonomous agentic systems for clinical operations — this development process work is the foundation that makes that possible. If the idea of designing the system that governs how an engineering team builds software excites you, this is the role.

How to Apply

Reply with:

The most complex agentic or LLM orchestration system you've built — what it did, how it was structured, what you learned
Your honest take on where agentic AI is right now and what's actually hard
What interests you about building process infrastructure vs. product features

Generic responses get skipped. We know the field is young — we'd rather hear about one deep experience than a list of tools you've touched.

Required skills experience

Python	3 years
TypeScript	2 years
AI/ML	2 years

Required domain experience

Healthcare / MedTech

6 months

Required languages

English	B1 - Intermediate
Ukrainian	Native

LangChain, CrewAI, AutoGen, MCP (Model Context Protocol), Claude Code, Cursor, prompt engineering, eval frameworks, agent orchestration, structured output validation

Published 11 March · Updated 19 March

208 views

26 applications

Response activity: High

Last responded yesterday

To apply for this and other jobs on Djinni login or signup.

from 3 years of experience

Considering with 2 years of experience
Full Remote
Countries of Europe or Ukraine
Countries where we consider candidates
- English B1 - Intermediate
- Ukrainian Native

ML / AI

Python	3 years
TypeScript	2 years
AI/ML	2 years

Employment: Fulltime
Domain: Healthcare / MedTech
Product

Apply for the job

Response activity: High

Last responded yesterday

📊 Average salary range of similar jobs in analytics →