Senior Infrastructure Engineer Offline
We’re building a next-generation intelligence platform that extracts, enriches, and organizes complex institutional and investor data from across the public web — turning it into structured insight using cutting-edge LLMs and scalable pipelines.
You’ll be the founding engineer responsible for owning the entire data infrastructure: from orchestration to enrichment, from pipeline reliability to model deployment. You’ll work directly with the founder and a dedicated LLM engineer to build the brain of the platform.
This is a hands-on, high-impact role for a senior engineer who moves fast and builds for scale — someone who can set up Prefect DAGs, productionize LangChain pipelines, handle massive parser output, and deploy open-source LLMs when needed.
What You’ll Do
- Design and run Prefect 2.0 workflows for parsing, enrichment, and QA feedback loops
- Own LangChain-based enrichment pipelines: vectorization, chunking, RAG, summarization, deduplication
- Deploy, scale, and monitor open-source LLMs (e.g., Mistral, Zephyr) using tools like vLLM or TGI
- Build and maintain high-reliability ingestion pipelines that handle unstructured inputs from 5M+ pages
- Collaborate with the LLM engineer to structure training data and support fine-tuning workflows
- Set up confidence scoring, retry logic, logging, and low-confidence QA routing
- Manage performance and cost of GPU resources (Lambda Labs, HuggingFace, or AWS)
- Build data integrity checks, schema validators, and batch loaders for Postgres and vector DBs
- Help scale the system to support 1m+ investor profiles and daily data refresh cycles
What We’re Looking For
- 5–10+ years in data, backend, or ML infrastructure roles
- Deep experience with Prefect 2.0 (or Airflow), orchestration, retries, and alerts
- Strong command of LangChain, prompt pipelines, vector stores (Qdrant, Weaviate, FAISS)
- Hands-on experience parsing large-scale web data (Playwright, Scrapy, Puppeteer, proxies, etc.)
- Able to manage model deployment stacks (vLLM, TGI, Docker, inference serving)
- Familiar with fine-tuning open-source models using Axolotl, HuggingFace, or PEFT
- Fluent in Python, async workflows, and infrastructure that scales
- Understands entity resolution, semantic deduplication, and QA scoring logic
- Bonus: experience with graph-based entity networks or security-grade crawling systems
Why Join Us
- You’ll own the full stack of one of the most technically ambitious data intelligence products in alt finance
- You’ll work alongside a senior LLM engineer, 2 parser engineers, and QA — and ship every week
- No bureaucracy, no fluff — real product, real adoption, real velocity
- Remote, async-first team with deep venture + AI background
- You’ll compete and outperform companies like PitchBook, Harmonic, and Fintrx — with 1/10th the headcount
The job ad is no longer active
Look at the current jobs Data Engineer →