Computer Vision Specialist / Detection Model Training Engineer (Python, Deep Learning)

Responsibilities:

  • Design and implement scalable training pipelines for image and video detection models using large-scale real-world datasets.
  • Fine-tune and evaluate transformer-based and CNN-based detection models (e.g., YOLOv8, DINO, SAM, DETR, Mask R-CNN) for tasks such as object detection, segmentation, and visual tracking.
  • Collaborate with data engineers to preprocess, clean, and structure large volumes of image/video data.
  • Optimize model training for speed and accuracy on multi-GPU clusters (e.g., using PyTorch DDP, DeepSpeed, or Hugging Face Accelerate).
  • Continuously benchmark model performance using established metrics (mAP, IoU, AR, etc.) and custom evaluation scripts.
  • Experiment with techniques such as multi-scale training, data augmentation, semi-supervised learning, and prompt-based vision models (e.g., SAM or DINO-based pipelines).

     

Requirements:

  • Strong proficiency in Python and deep learning frameworks (PyTorch preferred).
  • Hands-on experience with training or fine-tuning computer vision models on large-scale image/video datasets (COCO, LVIS, ImageNet, custom video datasets, etc.).
  • Familiarity with modern detection and segmentation frameworks such as YOLO (v8-v12 ), DINO, SAM, DETR, and other transformer-based vision models.
  • Experience with distributed training setups (e.g., DDP, DeepSpeed, FSDP), mixed-precision training, and memory optimization.
  • Solid grasp of computer vision fundamentals: object detection, instance segmentation, image augmentations, and feature pyramids.
  • Experience managing training workflows: checkpoints, hyperparameter tuning, logging (e.g., TensorBoard, Weights & Biases), and model versioning.

 

Bonus:

  • Knowledge of efficient model deployment techniques: ONNX, TensorRT, quantization, pruning, or distillation.
  • Experience integrating detection models into real-time pipelines or edge environments.
  • Contributions to open-source vision libraries or published research in vision conferences (e.g., CVPR, ICCV, ECCV).

 

Preferred Qualifications:

  • Familiarity with large-scale annotation workflows or synthetic data generation for visual tasks.
  • Experience evaluating vision models with custom benchmarks or using tools like FiftyOne, CVAT, or Roboflow.

Required languages

English B1 - Intermediate
Published 12 August
48 views
ยท
10 applications
70% read
ยท
70% responded
Last responded 1 week ago
To apply for this and other jobs on Djinni login or signup.
Loading...