Who:
Weβre looking for a Senior AI/ML Engineer who combines strong machine learning fundamentals with a disciplined, evaluation-first approach to improving production AI systems.
What:
Youβll improve real AI systems used by real users β through metrics, experimentation, system design, and iteration.
When:
Start ASAP.
Where:
Remote (any location).
Type:
Full-time.
English:
Upper-intermediate or Advanced required.
The Role in a Nutshell
This is not a pure research position.
Youβll work on improving production AI systems through structured evaluation, diagnosing failures, refining prompts and agent workflows, and designing measurable improvements.
You wonβt be working in isolation β youβll improve systems that users depend on daily.
Responsibility Breakdown (Approximate)
- AI evaluation & KPI design β ~30%
- Prompt and multi-agent system design β ~30%
- ML systems (recommendation, optimization, etc.) β ~30%
- Engineering integration β ~10%
What Youβll Work On
AI Evaluation & System Quality (Core Responsibility)
- Design evaluation strategies for LLM and agent workflows
- Define metrics and KPIs for AI system performance
- Build and maintain evaluation datasets
- Systematically debug production AI failures
- Compare system behavior against baselines
- Run structured experiments and measure real impact
This is a central part of the role.
Multi-Agent AI Systems
- Improve agent orchestration and workflows
- Diagnose failures across agent pipelines
- Refine system prompts and agent interactions
- Improve reliability, latency, and output quality
Applied ML Areas (Depth > Breadth)
Youβll contribute to one or more of the following areas:
- Recommendation systems (ranking & personalization)
- Itinerary optimization & constraint-based planning
- LLM-based reasoning systems
- (Optional) Computer vision pipelines
Depth in at least one of these areas is more important than shallow experience in many.
Engineering Environment
We use:
- Golang (primary production language)
- Python for ML workflows
- Postgres, Redis, internal services
You donβt need to be a Go expert on day one, but you must be comfortable reading and modifying production code.
Backend engineers own infrastructure-heavy services β your focus is on AI system behavior, correctness, and measurable improvement.
What Weβre Looking For
1οΈβ£ Strong AI/ML Fundamentals (Must-Have)
You understand the theory behind what you build and can choose appropriate methods.
Examples:
- Evaluation metrics (precision, recall, F1, etc.)
- Ranking & recommendation concepts
- Embeddings and similarity
- Experimentation methodology
Not required:
- Academic publications
- Advanced theoretical math
- Large-scale model training experience
2οΈβ£ Evaluation-Driven Mindset (Most Important Signal)
You:
- Think in metrics and baselines
- Design experiments instead of guessing
- Measure improvements quantitatively
- Debug failures methodically
This is the most important quality for this role.
3οΈβ£ Experience with LLM Systems
Youβve worked with:
- Prompt design
- Agent workflows
- Evaluating LLM outputs
- Production LLM integrations
4οΈβ£ Ability to Ship Production Systems
You can:
- Turn ideas into working systems
- Iterate based on results
- Balance exploration with delivery
5οΈβ£ Programming Ability
Youβre comfortable writing production code in at least one language (Python, Go, or similar) and learning others as needed.
Strong Signals (Nice to Have)
- Experience improving AI systems post-deployment
- Recommendation or ranking system experience
- Optimization / constraint-based systems
- Computer vision experience
- Experience building evaluation frameworks
- Golang experience
- Startup or small-team engineering background
Ideal Candidate
- 5+ years of experience in AI/ML (exceptionally strong 3+ year candidates considered)
- Strong English communication skills
- Comfortable working in a fast-moving environment
- Focused on measurable impact, not just models