Labelwise

ML Engineer — Structured Data Extraction

$$$

Remote · Contract · EU time zone preferred · Capacity: propose what the scope needs.


We pay for outcomes, not hours. You own the pipeline; we own the product.

Contract engagement, scope-driven. Propose your own structure — trial period, fixed-price milestones, retainer, or a hybrid. Whatever fits how you actually work.
 

We're building the definitive structured quality, safety & value dataset for the global supplement market. Two customers: consumers (React Native app) and AI systems (structured API). The data pipeline IS the product. 
 

What you deliver:

A pipeline that generalizes across our module types: boolean classification, multi-class classification, characteristic extraction, entity extraction, entity linking/grouping, and structured attribute profiling.
Core deliverables:

  • Golden-record schema and workflow that works for any of the above, co-built with our regulatory SME
  • Eval harness in CI — accuracy reported on every commit, per module
  • Versioned prompt/rule system with rollback
  • Measured, reproducible accuracy numbers on held-out sets, per module, against written gates
  • A workflow our SME can operate without a developer in the loop

Success = numbers we trust, on sets we trust, with a pipeline that won't silently regress — and a framework that scales to every remaining module without rework.
 

Stack: Python, DSPy, vLLM, PaddleOCR, RT-DETR, open-weight VLMs and LLMs. Laravel/Filament backend (you consume it, our backend team maintains it).
 

Required:

  • 3+ years shipping production ML/data pipelines — not research
  • Prompt optimization at scale: DSPy / BootstrapFewShot / MIPROv2, or equivalent (LangChain + pytest eval harness counts)
  • VLM inference serving (vLLM / TGI / llama.cpp)
  • OCR pipelines wired to detection models
  • Eval-first: you ship the harness before the pipeline
  • Clean Python, git discipline, versioned prompts
  • You've maintained a pipeline in production for 12+ months (not just built and handed off

 

Not a fit if you:

  • Propose RAG for structured extraction
  • Want to fine-tune before prompt-optimizing
  • Want to pre-train a domain model
  • Need a PM to translate goals into tickets
     

Nice to have: Comfortable reading enough PHP/Laravel to integrate with our Filament admin.

 

We provide: 

  • Direct CEO collaboration — fast decisions.
  • Regulatory SME owns golden records and sign-off
  • Backend team handles schema and infra
  • Written acceptance criteria before each module
     

Compensation: fixed-price per milestone, or retainer. Propose your structure with application.
 

How to apply?

Fill out questioner below and submit.

NB. No cover letters. No AI-generated applications — we'll notice.

Required skills experience

Python 4 years
Machine Learning 3 years
Machine Learning model evaluation 2 years
LLM 2 years
Computer Vision 2 years

Required languages

English B2 - Upper Intermediate
DSPy, vLLM, PaddleOCR, Prompt Engineering, MLOps, Docker, Git, MySQL, OCR, REST API
Published 21 April
40 views
·
4 applications
To apply for this and other jobs on Djinni login or signup.
Loading...