Prompt Engineer - Health AI Classification (95%+ Accuracy)
Remote | Hourly Contract | Potential full-time
We're launching supplement analysis platform for 25,000+ of products but accuracy of data isn't good enough. Infrastructure is done. Prompts need fixing.
What you'll do:
- Optimize 3 critical LLM classification tasks (data: filtering, categorization, extraction)
- Achieve >95% accuracy with cost-effective models (e.g. Gemini, Grok)
- Build regression test coverage in external testing enviroment
- Analyze failure patterns and improve prompts systematically
- Document methodology for team
Requirements:
- Proven track record achieving >95% accuracy on complex classification
- Experience with high-stakes AI (healthcare/finance/legal)
- Systematic testing approach (not trial-and-error)
- Multi-model expertise (GPT-4, Claude, Gemini, etc.)
Nice to have:
- Healthcare/supplement domain knowledge
- MySQL or similar familiarity
- Structured output optimization (JSON mode, function calling)
- Fine-tuning experience
We provide:
- Complete infrastructure (external testing enviroment, admin panels)
- Multi-LLM access (Claude, GPT, Gemini, Grok, OpenRouter)
- Data analyst for verification support
- Clear success metrics (>95% accuracy)
- Competitive hourly rate
To Apply:
• Write us message by clicking "Apply for the job"
• Fill out questioner here: https://tally.so/r/mBO904
Required languages
| English | B2 - Upper Intermediate |
📊
Average salary range of similar jobs in
analytics →
Loading...