Senior AI/ML Engineer (Multimodal AI / LLM Evaluation / Trust and Safety)

$$$$

We are looking for a Senior AI/ML Engineer experienced in multimodal AI pipelines, LLM evaluation frameworks, agent orchestration, and scalable Python backend systems on AWS.

 

A Korean company is building and hardening an AI verification and compliance layer for AIGC/UGC financial content.

The system includes:

  • four evaluation/verification models
  • an automated compliance rule engine
  • a Trust & Safety moderation pipeline

The core architecture processes video-based content through a multi-stage pipeline:

STT โ†’ OCR โ†’ image/video analysis โ†’ financial rules engine โ†’ LLM-based decision layer

Final output decision types:

  • Auto-Block
  • Auto-Pass
  • Human Review

This role requires strong production engineering skills combined with hands-on experience in multimodal LLM evaluation systems.

 

Required Skills & Experience

  • Multimodal LLM evaluation pipeline design
    (LLM-as-a-judge, rubric-based scoring, prompt/few-shot optimization, schema design, agent orchestration)
  • Model evaluation engineering
    (golden dataset creation, metric definition: precision/recall/human agreement, A/B testing, benchmarking automation)
  • AI serving & inference optimization
    (AWS Bedrock integration, vLLM / GPU inference, latency optimization, cost-aware design, fallback strategies)
  • Strong Python backend engineering (~7โ€“10 years)
    (async systems, queues, microservices, service-oriented architecture)
  • Trust & Safety / compliance systems
    (rule engines: keyword + context IF-THEN logic, moderation workflows, HITL queues, severity-based escalation)

Nice to Have

  • Experience fine-tuning or improving LLM / video understanding models
  • Background in content moderation or compliance systems
  • Experience with Korean language content processing
    (STT, OCR optimization, NSFW detection pipelines)

 

Tech Stack

Languages / APIs: Python (FastAPI), React (TypeScript)
LLM / Multimodal: AWS Bedrock, VLMs (e.g. Qwen2.5-Omni), vLLM
Agent frameworks: LangGraph, CrewAI, Google ADK, LangChain
Evaluation: LLM-as-judge, rubric scoring, DeepEval-style eval harnesses
Cloud / Infra: AWS Bedrock, SageMaker, RDS (PostgreSQL), S3, EKS
Observability: Langfuse, prompt testing & tracing tools
Data platform: Databricks
Media processing: Whisper, OCR pipelines, image moderation tools (e.g. AWS Rekognition)

 

Required languages

English B2 - Upper Intermediate
Ukrainian Native
Published 5 June
14 views
ยท
5 applications
To apply for this and other jobs on Djinni login or signup.
Loading...