Senior ML Engineer

Department/Project Description

The Client is one of the leaders in medical science for more than 40 years, who committed to solving the challenges that matter most – united by a deep caring for human life. The mission is to advance science for life and transform lives through innovative medical solutions that improve patient lives, create value for end-users, and support the employees and the communities in which the Client operates.

So, choosing a career with our team isn’t just business, it’s personal. And if you’re a natural problem-solver with the imagination, determination, and spirit to make a meaningful difference to people worldwide, we encourage you to apply and look forward to connecting with you!



Job Description

We are looking for a specialized Machine Learning Engineer to build an end-to-end Automated Document Comparison & Compliance System. In this role, you will develop a pipeline that ingests complex regulatory documents, identifies critical entities, and detects semantic changes between versions. You will bridge the gap between unstructured documents (PDFs, scans) and structured, actionable insights for regulatory teams.

  • Core Language: Expert-level proficiency in Python.
  • NLP Frameworks: Strong experience with spaCy, Hugging Face Transformers, and NLTK.
  • Document Intelligence: Hands-on experience with OCR tools and LayoutLM (or similar document understanding models).
  • Computer Vision: Familiarity with OpenCV or similar libraries for image processing and symbol detection.
  • Embeddings: Deep understanding of vector databases and semantic search techniques.
  • General ML: Proficiency with PyTorch or TensorFlow.

Nice-to-Have

  • Experience working with highly regulated documents (Medical, Legal, or Financial).
  • Experience extracting data from complex tables within PDFs.

Job Responsibilities

1. Intelligent Document Processing (OCR & Layout Analysis)

Pipeline Building: Develop pipelines to convert scanned and digital documents into machine-readable formats using OCR tools (e.g., Tesseract, AWS Textract, or Azure Form Recognizer).

Segmentation: Implement logic to segment documents into structural blocks: headers, paragraphs, tables, and images to normalize the structure for comparison.

2. NLP & Entity Recognition (NER)

Custom NER Models: Train and fine-tune Named Entity Recognition (NER) models to extract domain-specific entities such as device names, dosages, warnings, and regulatory standards.

Data Structuring: Map extracted entities to predefined categories to enable "focused comparison" of medically/legally relevant content rather than just simple text matching.

3. Semantic Analysis & Change Detection

Embeddings & Alignment: Use Semantic Embeddings (e.g., BERT, RoBERTa, Sentence Transformers) to align text blocks across different document versions, even when content has moved.

Granular Diffing: Create algorithms to detect "Modified," "Added," or "Removed" content at the sentence/phrase level, ensuring high traceability for audit logs.

4. Computer Vision (Image & Symbol Analysis)

Visual Comparison: Build Computer Vision modules to extract and compare non-text elements like figures, diagrams, and regulatory symbols.

Compliance Checks: Automate the detection of changes in pictograms or visual warnings to ensure visual compliance.

5. Output Generation

Work with backend teams to convert ML findings into structured comparison tables and narrative summaries for export to Excel/Word/PDF.

Required languages

English B1 - Intermediate
Published 15 December 2025
26 views
·
2 applications
50% read
To apply for this and other jobs on Djinni login or signup.
Loading...