RL Environments Engineer to $20000

We’re hiring RL Environments Engineers to design and build MLE/SWE environments that deliver high-quality, diverse tasks with minimal supervision. You will target a specific language model, meet a defined difficulty distribution, and deliver about one task every 10 hours. This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required.

 

Responsibilities

  • Design and build MLE/SWE environments and diverse tasks.
  • Target a specified language model and satisfy the required difficulty distribution.
  • Deliver ~1 task per 8-10 hours once onboarded.
  • Edit tasks within 24 hours based on customer feedback.
  • Onboard quickly and start delivering on day one with minimal supervision.

     

Requirements

What we’re looking for (must-haves)

  • Strong Python (engineering-quality, not notebook-only).
  • Hands-on LLM/GenAI work in production: you’ve shipped and operated real systems (not “wrapped an API and called it AI”).
  • Strong product/engineering ownership: comfortable building, fixing, and scaling end-to-end pipelines.
  • Docker + production mindset (debugging, reliability, iteration speed).
  • ≥4 hours PST overlap and advanced English (C1/C2) for specs, reviews, and feedback.
  • Ability to meet throughput expectations and respond quickly to feedback.
     

Strong signals (nice-to-have, big plus)

  • Experience designing environments/tasks for RL and/or evaluations.
  • Experience in high-stakes or regulated domains (e.g., healthcare, finance, fraud/risk, safety-critical systems).
  • ML systems experience: CI/CD, monitoring, evaluation harnesses, MLOps, scalable pipelines.
  • Systems depth: C++/Rust/Scala/Java, performance/infra optimization, distributed systems.
  • Exposure to RL / bandits / agentic systems (not required, but a strong signal).
     

Not a fit if

  • You’re primarily a prompt engineer without strong ML/engineering foundations.
  • You’re a research-only / academic-only profile with little or no shipping/production ownership.
  • You’ve only built in notebooks or rely heavily on managed AutoML tools.

     

Working Conditions

  • Remote, independent contractor engagement.
     
  • 40 hours/week - full time - need 4 hours overlap in the working hours with the team in Pacific time zone;
  • Deliverables-driven; begin shipping on day one.

Conversion & relocation: Potential path to FTE and relocation to the Bay Area if performance and mutual fit align.
 

Required skills experience

Python 5 years
LLM 3 years
Reinforcement Learning 6 months

Required languages

English C1 - Advanced
Python, Machine Learning, Deep Learning, Docker, LLM, Generative AI, English, NLP, Data Science/Machine Learning
Published 28 January
115 views
·
11 applications
78% read
·
78% responded
Last responded yesterday
To apply for this and other jobs on Djinni login or signup.
Loading...