Product-Minded Full-Stack / AI Engineer for Web Data Intelligence Platform

$$$

Summary

# Job Title

Product-Minded Full-Stack / AI Engineer for Web Data Intelligence Platform

# Job Description

We are building an AI-powered platform that collects, structures, and analyzes information from public websites and documents.

The system needs to process messy real-world data from many organizations, extract useful structured information, store it in a database, and present it through a user-facing dashboard.

This is not a basic scraping project. We are looking for someone who can think through system architecture, data pipelines, database structure, AI extraction workflows, and product design.

The right person should be able to take an existing MVP, understand the logic behind it, and help redesign it into a more scalable and configurable system.

# What You’ll Work On

You will help design and build a platform that can:

* Discover and process public web pages and documents.
* Extract structured information from unstructured text and PDFs.
* Normalize information into a database.
* Track updates over time.
* Detect duplicates and changes.
* Classify and score records based on configurable rules.
* Match structured records to user-defined profiles.
* Support a dashboard for review, filtering, and workflow management.
* Make the system configurable across different use cases.

# Required Skills

Strong fit if you have experience with:

* Python and/or Node.js
* Web crawling / scraping
* PDF and document extraction
* LLM-based extraction and classification
* Database design
* Full-stack development
* Data pipelines
* AI workflow design
* Prompt engineering
* Handling messy real-world data

Bonus if you have worked with:

* Public web data
* Search or monitoring systems
* Workflow dashboards
* Vector databases
* Supabase / Postgres
* Playwright
* OpenAI / Claude APIs
* LangChain / LlamaIndex or similar tools

# Important

We are not looking for someone who only knows how to scrape websites.

We are looking for someone who can help turn an MVP into a structured, scalable intelligence platform.

You should be comfortable thinking about:

* Data models
* Extraction quality
* Source reliability
* Change detection
* Deduplication
* Configurable scoring logic
* Human review workflows
* Product usability

# What We Are Looking For

A strong candidate will think in terms of:

* Sources
* Documents
* Records
* Evidence
* Configurations
* Profiles
* Matches
* Processing runs
* Review workflows
* Data quality
* Deduplication
* Change tracking
* Explainable scoring

A weak candidate will only talk about scraping pages.

# Engagement

This will start as a short test project. If successful, it can become an ongoing engineering role.

We are looking for someone who can eventually take ownership of the platform architecture and development.

# To Apply

Please answer the following questions:

1. Describe a similar system you have built or would know how to build.
2. How would you separate use-case-specific logic from the core platform?
3. How would you detect whether a record is new, updated, duplicate, stale, or no longer relevant?
4. What database schema would you start with?
5. What would you automate, and where would you keep human review?
6. Please share examples of relevant work if available.

Applications without thoughtful answers will not be considered.

Required domain experience

PropTech / Real Estate 1 year

Required languages

English C1 - Advanced

Published 16 June

99 views

14 applications

Response activity: Medium

Last responded 5 days ago

See stats of candidates who applied for this job 👀

See applicant insights

To apply for this and other jobs on Djinni login or signup.

from 8 years of experience

Considering with 7 years of experience
Full Remote
Worldwide
Countries where we consider candidates
- English C1 - Advanced

Data Engineer

Employment: Fulltime
Domain: Fintech
Startup
Test task is needed

Apply for the job

Response activity: Medium

Last responded 5 days ago

📊 Average salary range of similar jobs in analytics →