Senior Infrastructure Engineer Offline

We’re building a next-generation intelligence platform that extracts, enriches, and organizes complex institutional and investor data from across the public web — turning it into structured insight using cutting-edge LLMs and scalable pipelines.

You’ll be the founding engineer responsible for owning the entire data infrastructure: from orchestration to enrichment, from pipeline reliability to model deployment. You’ll work directly with the founder and a dedicated LLM engineer to build the brain of the platform.

This is a hands-on, high-impact role for a senior engineer who moves fast and builds for scale — someone who can set up Prefect DAGs, productionize LangChain pipelines, handle massive parser output, and deploy open-source LLMs when needed.

 

What You’ll Do

  • Design and run Prefect 2.0 workflows for parsing, enrichment, and QA feedback loops
  • Own LangChain-based enrichment pipelines: vectorization, chunking, RAG, summarization, deduplication
  • Deploy, scale, and monitor open-source LLMs (e.g., Mistral, Zephyr) using tools like vLLM or TGI
  • Build and maintain high-reliability ingestion pipelines that handle unstructured inputs from 5M+ pages
  • Collaborate with the LLM engineer to structure training data and support fine-tuning workflows
  • Set up confidence scoring, retry logic, logging, and low-confidence QA routing
  • Manage performance and cost of GPU resources (Lambda Labs, HuggingFace, or AWS)
  • Build data integrity checks, schema validators, and batch loaders for Postgres and vector DBs
  • Help scale the system to support 1m+ investor profiles and daily data refresh cycles

What We’re Looking For

  • 5–10+ years in data, backend, or ML infrastructure roles
  • Deep experience with Prefect 2.0 (or Airflow), orchestration, retries, and alerts
  • Strong command of LangChain, prompt pipelines, vector stores (Qdrant, Weaviate, FAISS)
  • Hands-on experience parsing large-scale web data (Playwright, Scrapy, Puppeteer, proxies, etc.)
  • Able to manage model deployment stacks (vLLM, TGI, Docker, inference serving)
  • Familiar with fine-tuning open-source models using Axolotl, HuggingFace, or PEFT
  • Fluent in Python, async workflows, and infrastructure that scales
  • Understands entity resolution, semantic deduplication, and QA scoring logic
  • Bonus: experience with graph-based entity networks or security-grade crawling systems

Why Join Us

  • You’ll own the full stack of one of the most technically ambitious data intelligence products in alt finance
  • You’ll work alongside a senior LLM engineer, 2 parser engineers, and QA — and ship every week
  • No bureaucracy, no fluff — real product, real adoption, real velocity
  • Remote, async-first team with deep venture + AI background
  • You’ll compete and outperform companies like PitchBook, Harmonic, and Fintrx — with 1/10th the headcount

The job ad is no longer active

Look at the current jobs Data Engineer →

Loading...