RL Environments Engineer - Low-Level Engineering and Kernel Inference Optimization
We're hiring Low-Level Engineers to design and build RL environments that teach LLMs kernel development, hardware optimization, and systems programming. The goal is to create realistic feedback loops where models learn to write high-performance code across GPU and CPU architectures.
This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required.
Requirements
Minimal Qualifications
- Strong Python (engineering-quality, not notebook-only)
- Production mindset (debugging, reliability, iteration speed)
- Clear understanding of LLMs, their current limitations
- Ability to meet throughput expectations and respond quickly to feedback
You may be a good fit if one of the following applies
- Deep understanding of memory hierarchies (registers, L1/L2/shared memory, HBM, system RAM) and their performance implications
- Threading models, synchronization primitives, and concurrent programming (warps, thread blocks, barriers, atomics)
- Cache coherence, memory access patterns, coalescing, and bank conflicts
- JIT compilation frameworks (e.g., Triton, JAX/XLA, TorchInductor, Numba)
- AOT compilation and optimization passes (LLVM, MLIR, TVM)
- Compiler and kernel frameworks such as CUTLASS, BitBLAS, or JAX/Pallas
- Modern C++, including templates, concurrency, and build systems
- Assembly-level programming and low-level optimization across GPU and CPU architectures (e.g., x86, ARM, NVIDIA Hopper, NVIDIA Blackwell)
- Debugging and optimizing GPU kernels using CUDA and/or HIP/ROCm
- Developing PyTorch custom operators, backend extensions, or dispatcher integrations (e.g., ATen, TorchScript, or custom backends)
- Customizing, extending, or optimizing c, including distributed inference workflows
- GPU communication libraries and collectives, such as NVIDIA NCCL, AMD RCCL, MPI, or UCX
- Mixed-precision and low-precision kernels (e.g., FP16, BF16, FP8, INT8), including numerical stability and performance trade-offs
Required skills experience
| Python | 3 years |
| LLM | 3 years |
Required languages
| English | C1 - Advanced |
C++, JIT, Triton, CUDA, CUTLASS, BitBLAS, JAX, TorchInductor, Numba
📊
Average salary range of similar jobs in
analytics →
Loading...