Senior Data Engineer

Our client is a legal tech startup that focuses on AI and machine learning, specifically building chatbots to answer legal questions for lawyers. They are looking for a Senior Data Engineer for a high-impact project: digitizing law in Morocco and Africa and creating the first AI-quarriable legal knowledge base.

Their ambition is to build a platform capable of answering legal questions in a reliable, well-sourced, and traceable way, based on a massive corpus of heterogeneous legal documents.

🚀 Why this project is different
You will join a true “knowledge infrastructure” mission:

Contribute to making the law more accessible
Build a durable asset: a structured database of Moroccan law (in French), extensible to Africa
Work on a concrete and deep technical challenge: transforming unstructured data into exploitable, reliable, and maintainable data at scale

Required skills:

3+ years of experience in Data Engineering and/or applied Document AI / NLP
Strong proficiency in Python
Hands-on experience with unstructured documents: PDF parsing, OCR, cleaning, structuring
Used to delivering to production: robust pipelines, observability, quality, performance

🛠 Stack/skills (indicative)

Storage: AWS
Document processing: OCR/parsing tools, text preprocessing pipelines
Testing & quality: metrics, sampling, automated validatio

⭐ Nice to have

Experience with legal / regulatory corpora or high-precision content
Familiarity with multilingual issues and encoding
Basic knowledge of downstream needs (vector DBs, retrieval, citation)

Scope of work:

You will be responsible for the “documents → structured data” pipeline that will feed our AI (RAG) engine.

At the core of the role (technical focus)
Build a structured database of Moroccan law in French from highly heterogeneous data:

PDFs (text-based and scanned), Word files, images, text files, sometimes noisy or incomplete
Text extraction (parsing + OCR when needed), cleaning
Structuring: detection of titles/chapters/sections/articles, hierarchy, normalization
Intelligent chunking (based on legal structure rather than arbitrary size), with traceability (source, page, identifiers)
Metadata: date, type of text (law/decree/circular/case law, etc.), source, version, article numbers, etc.
Deduplication & versioning: redundant documents, amendments, consolidated versions
Industrialization: orchestration, logs, retries, idempotence, monitoring, quality tests

Required skills experience

Data Engineering	4 years
NLP	3 years
Python	4 years
AWS	4 years
OCR	3 years

Required languages

English

B2 - Upper Intermediate

Published 22 January

51 views

5 applications

50% read

50% responded

Last responded 3 days ago

To apply for this and other jobs on Djinni login or signup.

Only from 5 years of experience
Full Remote
Ukraine
Countries where we consider candidates
- English B2 - Upper Intermediate

Data Engineer

Data Engineering	4 years
NLP	3 years
Python	4 years

+ 2 more

Employment: Fulltime
Domain: Other
Outstaff

Apply for the job

Last responded 3 days ago

50% read

50% responded

📊 $4000-6000 Average salary range of similar jobs in analytics →