RL Environments Engineer to $20000
We’re hiring RL Environments Engineers to design and build MLE/SWE environments that deliver high-quality, diverse tasks with minimal supervision. You will target a specific language model, meet a defined difficulty distribution, and deliver about one task every 10 hours. This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required.
Responsibilities
- Design and build MLE/SWE environments and diverse tasks.
- Target a specified language model and satisfy the required difficulty distribution.
- Deliver ~1 task per 8-10 hours once onboarded.
- Edit tasks within 24 hours based on customer feedback.
- Onboard quickly and start delivering on day one with minimal supervision.
Requirements
What we’re looking for (must-haves)
- Strong Python (engineering-quality, not notebook-only).
- Hands-on LLM/GenAI work in production: you’ve shipped and operated real systems (not “wrapped an API and called it AI”).
- Strong product/engineering ownership: comfortable building, fixing, and scaling end-to-end pipelines.
- Docker + production mindset (debugging, reliability, iteration speed).
- ≥4 hours PST overlap and advanced English (C1/C2) for specs, reviews, and feedback.
- Ability to meet throughput expectations and respond quickly to feedback.
Strong signals (nice-to-have, big plus)
- Experience designing environments/tasks for RL and/or evaluations.
- Experience in high-stakes or regulated domains (e.g., healthcare, finance, fraud/risk, safety-critical systems).
- ML systems experience: CI/CD, monitoring, evaluation harnesses, MLOps, scalable pipelines.
- Systems depth: C++/Rust/Scala/Java, performance/infra optimization, distributed systems.
- Exposure to RL / bandits / agentic systems (not required, but a strong signal).
Not a fit if
- You’re primarily a prompt engineer without strong ML/engineering foundations.
- You’re a research-only / academic-only profile with little or no shipping/production ownership.
- You’ve only built in notebooks or rely heavily on managed AutoML tools.
Working Conditions
- Remote, independent contractor engagement.
- 40 hours/week - full time - need 4 hours overlap in the working hours with the team in Pacific time zone;
- Deliverables-driven; begin shipping on day one.
Conversion & relocation: Potential path to FTE and relocation to the Bay Area if performance and mutual fit align.
Required skills experience
| Python | 5 years |
| LLM | 3 years |
| Reinforcement Learning | 6 months |
Required languages
| English | C1 - Advanced |
Python, Machine Learning, Deep Learning, Docker, LLM, Generative AI, English, NLP, Data Science/Machine Learning
📊
Average salary range of similar jobs in
analytics →
Loading...