Senior Web Intelligence/ Parsing Engineer Offline

Remote | Full-Time | Flexible Hours (3+ hrs/day EST overlap)

Comiq.ai is an AI-native platform helping investment teams build stronger capital relationships through smarter outreach. We combine structured data, AI enrichment, and real-time signals to replace generic fundraising workflows with precision targeting and contextual messaging.

Behind the scenes, our platform transforms noisy, fragmented web data into clean, structured inputs — powering enrichment pipelines and insight generation at scale. We’re hiring a Senior Web Intelligence & Parsing Engineer to lead the ingestion layer of our stack. Your mission: extract and structure data from 10M+ company websites, team pages, filings, and LinkedIn-style public profiles—then convert that chaos into clean, structured signals ready for AI enrichment and vector search.

🔧 What You’ll Own

Scraping at Scale
Build and manage high-throughput scrapers for structured and semi-structured web sources
Extract people/org/fund info from websites, portals, filings, dark web sources, and public directories (e.g. LinkedIn-scale)
Operate large-scale crawlers with stealth browsers, IP rotation, session spoofing, and full automation
Smart Parsing & Structuring (Perplexity/Manus-style)
Use open-source frameworks like Unstructured.io, pdfplumber, and trafilatura to segment and extract content
Chunk and tag content sections (e.g. mandate, team, strategy) and output structured JSON + full text
Integrate prompt-based parsing or LLM-assisted validation when needed
Framework Integration & Agent Management
Leverage and extend tools like LangChain, LlamaIndex, and Haystack to manage parsing agents
Orchestrate crawlers and parsers across source types, retry logic, and evolving page structures
Monitor performance, schema compliance, and pipeline yield at scale
Collaboration with Backend/LLM Engineer
Deliver structured data for downstream LLM enrichment (RAG search, entity resolution, semantic classification)
Coordinate pipeline triggers, batch jobs, and API endpoints with backend engineering

🛠️ Our Stack

Scraping

Playwright · Puppeteer · Stealth Browsers · Proxy Pools · Tor

Parsing

Unstructured.io · pdfplumber · trafilatura · BeautifulSoup · regex + prompt hybrids

Orchestration

LangChain · LlamaIndex · Haystack · Docker · GitHub Actions

Storage & Flow

PostgreSQL · Redis · ClickHouse · PeerDB

LLMs & Embeddings

OpenAI · Hugging Face · SentenceTransformers · GH200s via Lambda Labs

✅ You’re a Fit If You...

Have 5–10+ years of experience in large-scale web data extraction and pipeline engineering
Have previously scraped or managed 1M+ entity-scale data sets (e.g., PDL, Apollo, Crunchbase, custom OSINT stacks)
Know how to deal with anti-bot protections, dynamic DOMs, session control, and throttling
Write modular, maintainable parsing code that won’t break with minor site changes
Are comfortable structuring messy data into well-formed, schema-compliant outputs
Bonus: Experience with financial, regulatory, or investor data parsing; Perplexity-style parsing; dark web scraping

🌍 Why This Role Matters

We’re not just scraping web pages—we’re building a structured knowledge engine for global capital intelligence. Your work will turn the open web into machine-readable signal that powers everything from mandate search to warm investor targeting to real-time enrichment.

The job ad is no longer active

Look at the current jobs Data and Analytics →

from 4 years of experience

Considering with 3 years of experience
Full Remote
Worldwide
Countries where we consider candidates

Data and Analytics
Selenium, Data Scraping & Processing, Web scrapping, Python

Domain: SaaS
Startup

Apply for the job

📊 $2500-4000 Average salary range of similar jobs in analytics →

Similar jobs

ML Engineer at VMK Group

Ukraine

ToF Vision Software Engineer (Python, C++ or Go) - preferably - Vinnytsia, Ukraine at Gemicle

Ukraine

Data Operations Analyst (Pharma industry) at N-iX

Countries of Europe or Ukraine

All jobs Infinium Venture Studio →