AI Vibe Developer
Remote AI Vibe Developer
Open-Source LLM Training & Agentic Product Engineering (Remote)
Role Summary
We’re hiring an AI Vibe Developer — a product-minded engineer who ships production-quality software quickly by combining strong engineering fundamentals with AI-assisted (“vibe coding”) workflows and modern LLMOps best practices (evals, tracing, guardrails, prompt/version management).
A core requirement of this role is working with an open-source / open-weights LLM that we will fine-tune and run ourselves. You’ll help take it end-to-end — from data → training → evaluation → serving → iteration — and integrate it into real, user-facing product flows.
Responsibilities
1) Build & ship AI-powered product features (end-to-end)
- Deliver features from prototype to production: UI, backend, integrations, deployments, and iteration.
- Use AI coding tools/agents to accelerate delivery while maintaining code quality (tests, reviews, standards).
- Own rollout strategy: feature flags, staged releases, rollback plans, and production support.
2) Design agentic workflows and “AI inside” capabilities
- Implement LLM features such as:
- Tool/function calling with strict schemas and reliable fallbacks.
- Retrieval-Augmented Generation (RAG) when needed (grounding, freshness, controllable outputs).
- Multi-step agent flows (routing, planner/executor patterns, memory/context strategies).
- Integrate external tools/services securely (internal APIs, CMS, analytics, ticketing, CRM, etc.).
3) Open-source LLM training & lifecycle ownership
- Help select the best open-weights model baseline based on quality, licensing, latency, context length, and multilingual support.
- Build and maintain the data pipeline: collection, cleaning, de-duplication, PII redaction, formatting, and dataset versioning.
- Run fine-tuning workflows (e.g., SFT and preference tuning when relevant) with reproducibility and experiment tracking.
- Build a practical evaluation suite (golden sets + edge/adversarial cases) to gate training runs and releases.
- Drive iterative improvement using an error taxonomy: failures → targeted data improvements → retrain → regression checks → release notes.
4) Inference, serving, and performance optimization
- Package and deploy the model for production serving (containerized) with reliability and observability.
- Optimize serving latency/cost via batching, caching, quantization, and safe model routing where applicable.
- Implement system-level guardrails: schema-constrained outputs, input validation, tool allowlists, rate limits, and safe degradation paths.
5) Evals-first development and observability
- Define success metrics (quality, groundedness, refusal behavior, latency, cost, user satisfaction) and track them continuously.
- Instrument LLM flows with tracing (inputs, tool calls, retrieved docs, outputs, model/version, latency, cost).
- Maintain prompt/version control and changelogs; support A/B tests and controlled rollouts.
6) Collaboration & enablement
- Work closely with Product, Design, Data, and QA to translate “vibes” into measurable outcomes.
- Create lightweight playbooks: eval templates, prompt guidelines, incident runbooks, and “how we build with AI here.”
- Improve team velocity by sharing patterns and avoiding AI productivity pitfalls (over-trusting outputs, under-testing).
Requirements
Core Engineering
- 3+ years of software engineering experience (or an equivalent portfolio of shipped products).
- Strong proficiency in Python (required for the training stack), plus at least one of: TypeScript/Node.js, Go, or Java.
- Strong fundamentals: API design, async workflows, testing strategy, debugging, CI/CD, Git, and production ownership.
Open-Source LLM Engineering (Must-have)
- Hands-on experience fine-tuning and/or deploying open-weights LLMs end-to-end: data → training → evaluation → serving.
- Strong understanding of:
- Tokenization and context budgeting
- Dataset quality, leakage risks, and evaluation methodology
- Instruction tuning concepts and preference optimization fundamentals
Required languages
| English | B2 - Upper Intermediate |