Senior Data Scientist (RAG + Retrieval Expert)
Who We Are
At Bennett Data Science, we've been pioneering the use of predictive analytics and data science for over a decade for some of the biggest brands and retailers. We're at the top of our field because we focus on delivering actionable AI for our clients. Our deep experience and product-first attitude set us apart from other groups and gets us the business results our clients want.
Why You Should Work With Us
You'll be exposed to a wide range of clients who are at the cutting edge of innovation in their field and get to work on fascinating problems, supporting real products, with real data. We help lots of companies, from some of the largest companies in the world to small startups in Silicon Valley who are building the next big thing.
Expert Mentorship: Direct guidance from senior staff with 20+ years of applied ML experience
Competitive Compensation: Market-rate pay with performance upside
Fully Remote: Work from any location of your choice, on a flexible schedule
Real Impact: Your models go into production and serve real users
The Role:
As a Senior Data Scientist, you will design, build, and deploy RAG systems and other AI solutions that help teams manage land, infrastructure, and stakeholder relationships more effectively. You'll work with large volumes of documents, permits, geospatial records, and stakeholder communications, turning unstructured and semi-structured data into reliable, queryable intelligence.
You'll own projects end-to-end: scoping the problem with internal and client-facing stakeholders, selecting retrieval and generation strategies, building and evaluating pipelines, and iterating on production systems alongside senior data engineers. You'll also contribute to the team's broader ML work: classification, extraction, geospatial modeling, and mentor junior team members as the practice grows.
This is a hands-on, senior individual contributor role. You'll need to be as comfortable explaining tradeoffs to a non-technical project manager as you are tuning a reranker or debugging retrieval recall. Client-facing communication is part of the job.
Requirements
A successful candidate has 5+ years of experience in applied data science and machine learning, with deep hands-on expertise in retrieval-augmented generation and a strong statistical foundation. They demonstrate the following:
RAG & Generative AI (Core Focus)
- Proven experience designing, building, and deploying RAG-based systems in production, including chunking strategies, embedding model selection, retrieval tuning, and reranking
- Strong understanding of hallucination mitigation techniques (grounding, citation enforcement, context window management, guardrails) and the ability to articulate tradeoffs across architectures
- Experience evaluating RAG pipeline quality end-to-end: retrieval recall, answer faithfulness, latency, and cost
- Familiarity with vector databases and hybrid search (e.g., Databricks, Pinecone, pgvector, or OpenSearch)
Applied ML & Engineering
- Production ML experience: building, deploying, and maintaining models that serve real users at scale
- Strong Python skills including scikit-learn, pandas, NumPy, and at least one deep learning framework (PyTorch or TensorFlow)
- Solid statistical foundation: hypothesis testing, distributions, probability, experimental design
Communication & Working Style
- Experience translating model behavior, recommendations, and limitations for non-technical stakeholders, particularly in domains where trust and accuracy are critical
- Comfort working independently across multiple projects simultaneously
- English proficiency at B2 or above (written and spoken)
Nice to Have
- Experience applying LLMs or transformer-based NLP for structured text classification, information extraction, or embedding-based retrieval
- Geospatial feature engineering, location-based statistics, spatial indexing, or proximity scoring, particularly in land, infrastructure, or utility corridor contexts
- Experience with Vision-Language Models (VLMs) or satellite/aerial imagery analysis for document or land parcel interpretation
- Experience with cloud ML platforms (AWS SageMaker, GCP Vertex AI)
- Exposure to utilities, energy, infrastructure, land management, or enterprise SaaS domains
- Experience fine-tuning or adapting pre-trained models (LoRA, PEFT, or full fine-tune)
- Experience with agentic architectures and deployments
Required skills experience
| RAG systems | 2.5 years |
Required domain experience
| Machine Learning / Big Data | 5 years |
Required languages
| English | B2 - Upper Intermediate |
| Ukrainian | Native |