Lumnix

Inference Optimization Researcher

About Corvex

 

Corvex delivers unparalleled cloud-based AI infrastructure, featuring cutting-edge NVIDIA GPUs that combine exceptional reliability, security, performance, and value. We're ready to build a world-class experience for developers and data scientists across enterprise and AI-native organizations that will enable professionals to focus exclusively on training, fine-tuning, and inference of their AI models, while we manage the nuts and bolts of our premium infrastructure.

 

Position Description

 

We’re looking for a performance-obsessed AI researcher to join our team. Your mission is to turn whitepapers, profiling traces, and raw intuition into working optimizations that make real workloads faster. You’ll work across compiler frameworks, GPU kernels, graph IRs, memory strategies, and quantization techniques - with the freedom to test hypotheses quickly and the responsibility to ship production-worthy gains.

 

This is a research role with an emphasis on delivering production-quality code. You’ll sit at the boundary between compiler experimentation and systems engineering, helping make Ignite one of the fastest inference engines in the world.

 

What You’ll Do

  • Profile and optimize transformer-based inference pipelines
  • Research and experiment with graph-level and kernel-level optimizations
  • Design experiments, measure speedups, and continuously shave off latency across varying batch sizes, sequence lengths, and architectures
  • Collaborate closely with the engineering teams to ensure results translate into real-world improvements
  • Stay ahead of the optimization literature - and bring in ideas before they’re stale

 

What We’re Looking For

  • Strong programming experience in C++ and Cuda
  • Hands-on experience with performance profiling and tuning
  • Existing understanding or interest in learning GPU architecture
  • Comfort working with compiler toolchains and intermediate representations 
  • Curiosity and intensity - you enjoy tuning workloads at breakneck speed, not just shipping the baseline
  • Bonus: experience with quantization-aware training, kernel autotuning, or specialized LLM serving runtimes

 

What We Offer

  • Competitive salary with meaningful equity
  • A chance to help define a new category of AI infrastructure
  • Greenfield architecture - build the product you’ve always wanted to use
  • High trust and autonomy, with deep impact on platform direction
  • Remote-first culture with the option to collaborate in person as we scale
  • Small, highly skilled team and zero bureaucracy

Required languages

English C1 - Advanced
AI, ML, Research, C++, CUDA
Published 8 August
40 views
·
2 applications
100% read
·
100% responded
Last responded 4 weeks ago
To apply for this and other jobs on Djinni login or signup.
Loading...