Product-Minded Full-Stack / AI Engineer for Web Data Intelligence Platform
Summary
# Job Title
Product-Minded Full-Stack / AI Engineer for Web Data Intelligence Platform
# Job Description
We are building an AI-powered platform that collects, structures, and analyzes information from public websites and documents.
The system needs to process messy real-world data from many organizations, extract useful structured information, store it in a database, and present it through a user-facing dashboard.
This is not a basic scraping project. We are looking for someone who can think through system architecture, data pipelines, database structure, AI extraction workflows, and product design.
The right person should be able to take an existing MVP, understand the logic behind it, and help redesign it into a more scalable and configurable system.
# What Youโll Work On
You will help design and build a platform that can:
* Discover and process public web pages and documents.
* Extract structured information from unstructured text and PDFs.
* Normalize information into a database.
* Track updates over time.
* Detect duplicates and changes.
* Classify and score records based on configurable rules.
* Match structured records to user-defined profiles.
* Support a dashboard for review, filtering, and workflow management.
* Make the system configurable across different use cases.
# Required Skills
Strong fit if you have experience with:
* Python and/or Node.js
* Web crawling / scraping
* PDF and document extraction
* LLM-based extraction and classification
* Database design
* Full-stack development
* Data pipelines
* AI workflow design
* Prompt engineering
* Handling messy real-world data
Bonus if you have worked with:
* Public web data
* Search or monitoring systems
* Workflow dashboards
* Vector databases
* Supabase / Postgres
* Playwright
* OpenAI / Claude APIs
* LangChain / LlamaIndex or similar tools
# Important
We are not looking for someone who only knows how to scrape websites.
We are looking for someone who can help turn an MVP into a structured, scalable intelligence platform.
You should be comfortable thinking about:
* Data models
* Extraction quality
* Source reliability
* Change detection
* Deduplication
* Configurable scoring logic
* Human review workflows
* Product usability
# Initial Paid Test Project
The first step will be a paid test project.
You will receive limited sample data, example outputs, and a simplified description of the existing MVP.
Your task will be to:
1. Understand the current MVP structure.
2. Propose a more scalable system architecture.
3. Propose a database schema.
4. Define how configurable extraction and scoring should work.
5. Build a small prototype that processes a few public URLs or documents and produces structured output.
The prototype does not need to be production-ready. We care more about your thinking, structure, and ability to generalize the system.
# Test Project Deliverables
Please deliver:
* Short design document.
* Simple architecture diagram.
* Database schema proposal.
* Configuration example.
* Working prototype code.
* Example structured output.
* Notes on what you would build next.
# Example Output Fields
The system should produce structured records with fields such as:
* Source organization
* Record title
* Record description
* Category
* Status or stage
* Date / timeline if available
* Source URL or document
* Evidence from source
* Confidence score
* Match score against a sample profile
* Reason for match
* Suggested next action
# What We Are Looking For
A strong candidate will think in terms of:
* Sources
* Documents
* Records
* Evidence
* Configurations
* Profiles
* Matches
* Processing runs
* Review workflows
* Data quality
* Deduplication
* Change tracking
* Explainable scoring
A weak candidate will only talk about scraping pages.
# Engagement
This will start as a paid test project. If successful, it can become an ongoing engineering role.
We are looking for someone who can eventually take ownership of the platform architecture and development.
# To Apply
Please answer the following questions:
1. Describe a similar system you have built or would know how to build.
2. How would you separate use-case-specific logic from the core platform?
3. How would you detect whether a record is new, updated, duplicate, stale, or no longer relevant?
4. What database schema would you start with?
5. What would you automate, and where would you keep human review?
6. Please share examples of relevant work if available.
Applications without thoughtful answers will not be considered.
Required domain experience
| PropTech / Real Estate | 1 year |
Required languages
| English | C1 - Advanced |