AI Tooling and Infrastructure engineer
Hi! We're building a dedicated AI division โ creating both internal tooling and customer-facing AI products. Startup energy, backed by our established 20+ year tech company. New York and Krakow hubs, horizontal structure, direct impact.
What You'll Do
- Build AI-powered tools for internal teams and external customers
- Work directly with LLMs: prompt engineering, evaluation, quality measurement
- Design multi-model solutions โ choosing the right model for the right task
- Create APIs, frameworks, and infrastructure that make AI reliable in production
- Measure and improve AI output quality using evaluation frameworks and human feedback
What We're Looking For
Must have:
- You understand how LLMs behave โ not just how to call their APIs
(You can explain why a model hallucinates and what to do about it)
- Hands-on experience with multiple model families
(Claude, GPT, Gemini, DeepSeek, open-source โ at least 2-3)
- You've measured AI output quality in production
(Evaluation frameworks, LLM-as-judge, human validation โ any approach counts)
- Prompt engineering as a craft, not a buzzword
(You iterate, test, and know why your prompts work)
- Backend development experience (Python preferred, any language accepted)
- 2+ years professional software development
Signs you're a great fit:
- You know the difference between GPT-4o, Claude Sonnet, and Gemini Flash
โ and when to use which
- You've built something with AI that solves a real problem (not just a tutorial)
- You've improved AI output quality and can tell us how you measured it
Nice to have (we'll teach the rest):
- Experience building developer tools or frameworks
- Multi-agent systems (LangGraph, CrewAI, AutoGen)
- Knowledge of evaluation tooling (LangSmith, Ragas, DeepEval, custom solutions)
- Background in Java or high-performance systems
Required skills experience
| LLM | 1 year |
| AI/ML | 1 year |
| Prompt Engineering | 1 year |
| AI Agents | 1 year |
| Python | 1 year |
Required languages
| English | C1 - Advanced |