Python Engineer (Nvidia CUDA stack)
Description
GlobalLogic has been engaged in exploring opportunities to implement ML/AL/GenAI-powered applications for multinational industrial conglomerate since 2023, aiming to enhance business efficiency.
You will be working on projects implying complex data processing pipelines for predictive maintenance anaylitics.
Projects tech stack: Kubernetes, Terraform, Helm, LLM, ML, AI, Asyncio, Python, Pandas, HayStack, Azure Blob Storage, Azure DevOps, SQL Alchemy, Docker, Docker Compose, PySpark, PostgreSQL, FastAPI
Requirements
- 4+ years in backend/performance engineering (Python and/or C++) with production services.
- 1+ year hands-on with GPU-accelerated AI/ML inference (Triton/TensorRT/CUDA) in production.
- Proven delivery of NIM- or Triton-based microservices behind REST/gRPC (autoscaling, rollout strategies, monitoring).
- Practical experience profiling and removing GPU bottlenecks (memory, kernels, launch configs, batching, concurrency).
- Strong system design skills (throughput/latency/SLA) and high code quality (tests, reviews, docs).
Job responsibilities
- Design, build, and operate NIM-powered microservices (LLM, Embeddings, Reranker, ASR/TTS, VLM) for product features.
- Optimize inference end-to-end โ TensorRT-LLM engines, quantization, batching, concurrency, KV-cache, CUDA kernels where needed.
- Package & deploy via Triton + Kubernetes (Helm), set up GPU scheduling, MIG/MPS, canary/blue-green strategies.
- Expose stable APIs (REST/gRPC), versioning, auth/rate-limits; deliver SDK/client wrappers for internal teams.
- Instrument โ metrics/logs/traces (Prometheus/Grafana/Otel), DCGM, alerts, SLOs, runbooks, cost/capacity planning.
- Collaborate with DS/Platform/Product on priorities, model choices, and integration paths; ship incrementally.
- Maintain quality & security: tests, CI/CD, IaC, dependency hygiene, secrets management, network policies.
- Document designs, benchmarks, and operational playbooks.
Tools
NVIDIA โ NIM, Triton, TensorRT / TensorRT-LLM, CUDA, cuDNN, cuBLAS, NCCL, Nsight, DCGM
Backend/AI โ Python, FastAPI, AsyncIO, PyTorch/ONNX, vector search/RAG
Platform โ Docker, Kubernetes, Terraform, Helm, Azure Blob/DevOps, PostgreSQL, Redis, Kafka, Grafana/Prometheus/Otel
Required languages
| English | B2 - Upper Intermediate |