RL Environments Engineer to $20000

Preference Model Responds Quickly

We’re hiring RL Environments Engineers to design and build MLE/SWE environments that deliver high-quality, diverse tasks with minimal supervision. You will target a specific language model, meet a defined difficulty distribution, and deliver about one task every 10 hours. This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required.

Responsibilities

Design and build MLE/SWE environments and diverse tasks.
Target a specified language model and satisfy the required difficulty distribution.
Deliver ~1 task per 8-10 hours once onboarded.
Edit tasks within 24 hours based on customer feedback.
Onboard quickly and start delivering on day one with minimal supervision.

Requirements

What we’re looking for (must-haves)

Strong Python (engineering-quality, not notebook-only).
Hands-on LLM/GenAI work in production: you’ve shipped and operated real systems (not “wrapped an API and called it AI”).
Strong product/engineering ownership: comfortable building, fixing, and scaling end-to-end pipelines.
Docker + production mindset (debugging, reliability, iteration speed).
≥4 hours PST overlap and advanced English (C1/C2) for specs, reviews, and feedback.
Ability to meet throughput expectations and respond quickly to feedback.

Strong signals (nice-to-have, big plus)

Experience designing environments/tasks for RL and/or evaluations.
Experience in high-stakes or regulated domains (e.g., healthcare, finance, fraud/risk, safety-critical systems).
ML systems experience: CI/CD, monitoring, evaluation harnesses, MLOps, scalable pipelines.
Systems depth: C++/Rust/Scala/Java, performance/infra optimization, distributed systems.
Exposure to RL / bandits / agentic systems (not required, but a strong signal).

Not a fit if

You’re primarily a prompt engineer without strong ML/engineering foundations.
You’re a research-only / academic-only profile with little or no shipping/production ownership.
You’ve only built in notebooks or rely heavily on managed AutoML tools.

Working Conditions

Remote, independent contractor engagement.
40 hours/week - full time - need 4 hours overlap in the working hours with the team in Pacific time zone;
Deliverables-driven; begin shipping on day one.

Conversion & relocation: Potential path to FTE and relocation to the Bay Area if performance and mutual fit align.

Required skills experience

Python	5 years
LLM	3 years
Reinforcement Learning	6 months

Required languages

English

C1 - Advanced

Python, Machine Learning, Deep Learning, Docker, LLM, Generative AI, English, NLP, Data Science/Machine Learning

Published 28 January · Updated 28 February

Statistics:

221 views

14 applications

100% read

67% responded

Last responded 1 week ago

To apply for this and other jobs on Djinni login or signup.

from 5 years of experience

Considering with 4 years of experience
$15000-20000

Considering up to $20500
Full Remote
Worldwide
Countries where we consider candidates
- English C1 - Advanced

ML / AI

Python	5 years
LLM	3 years
Reinforcement Learning	6 months

Employment: Fulltime
Domain: Machine Learning / Big Data
Product
Test task is needed

Apply for the job

Last responded 1 week ago

100% read

67% responded

📊 Average salary range of similar jobs in analytics →