Computer Vision Specialist / Detection Model Training Engineer (Python, Deep Learning)

Responsibilities:

Design and implement scalable training pipelines for image and video detection models using large-scale real-world datasets.
Fine-tune and evaluate transformer-based and CNN-based detection models (e.g., YOLOv8, DINO, SAM, DETR, Mask R-CNN) for tasks such as object detection, segmentation, and visual tracking.
Collaborate with data engineers to preprocess, clean, and structure large volumes of image/video data.
Optimize model training for speed and accuracy on multi-GPU clusters (e.g., using PyTorch DDP, DeepSpeed, or Hugging Face Accelerate).
Continuously benchmark model performance using established metrics (mAP, IoU, AR, etc.) and custom evaluation scripts.
Experiment with techniques such as multi-scale training, data augmentation, semi-supervised learning, and prompt-based vision models (e.g., SAM or DINO-based pipelines).

Requirements:

Strong proficiency in Python and deep learning frameworks (PyTorch preferred).
Hands-on experience with training or fine-tuning computer vision models on large-scale image/video datasets (COCO, LVIS, ImageNet, custom video datasets, etc.).
Familiarity with modern detection and segmentation frameworks such as YOLO (v8-v12 ), DINO, SAM, DETR, and other transformer-based vision models.
Experience with distributed training setups (e.g., DDP, DeepSpeed, FSDP), mixed-precision training, and memory optimization.
Solid grasp of computer vision fundamentals: object detection, instance segmentation, image augmentations, and feature pyramids.
Experience managing training workflows: checkpoints, hyperparameter tuning, logging (e.g., TensorBoard, Weights & Biases), and model versioning.

Bonus:

Knowledge of efficient model deployment techniques: ONNX, TensorRT, quantization, pruning, or distillation.
Experience integrating detection models into real-time pipelines or edge environments.
Contributions to open-source vision libraries or published research in vision conferences (e.g., CVPR, ICCV, ECCV).

Preferred Qualifications:

Familiarity with large-scale annotation workflows or synthetic data generation for visual tasks.
Experience evaluating vision models with custom benchmarks or using tools like FiftyOne, CVAT, or Roboflow.

Required languages

English

B1 - Intermediate

Published 12 August

48 views

10 applications

70% read

70% responded

Last responded 1 week ago

To apply for this and other jobs on Djinni login or signup.

📊 Average salary range of similar jobs in analytics →