Data Architect (AWS and Python FastAPI)
Client
Our client is a leading legal recruiting company focused on building a cutting-edge data-driven platform for lawyers and law firms. The platform consolidates news and analytics, real-time deal and case tracking from multiple sources, firm and lawyer profiles with cross-linked insights, rankings, and more — all in one unified place.
Position overview
We are seeking a skilled Data Architect with strong expertise in AWS technologies (Step Functions, Lambda, RDS - PostgreSQL), Python, and SQL to lead the design and implementation of the platform’s data architecture. This role involves defining data models, building ingestion pipelines, applying AI-driven entity resolution, and managing scalable, cost-effective infrastructure aligned with cloud best practices.
Responsibilities
- Define entities, relationships, and persistent IDs; enforce the Fact schema with confidence scores, timestamps, validation status, and source metadata.
- Blueprint ingestion workflows from law firm site feeds; normalize data, extract entities, classify content, and route low-confidence items for review.
- Develop a hybrid of deterministic rules and LLM-assisted matching; configure thresholds for auto-accept, manual review, or rejection.
- Specify Ops Portal checkpoints, data queues, SLAs, and create a corrections/version history model.
- Stage phased rollout of data sources—from ingestion through processing, storage, replication, to management via CMS.
- Align architecture with AWS and Postgres baselines; design for scalability, appropriate storage tiers, and cost-effective compute and queuing solutions.
Requirements
- Proven experience as a Data Architect or Senior Data Engineer working extensively with AWS services.
- Strong proficiency in Python development, preferably with FastAPI or similar modern frameworks.
- Deep understanding of data modeling principles, entity resolution, and schema design for complex data systems.
- Hands-on experience designing and managing scalable data pipelines, workflows, and AI-driven data processing.
- Familiarity with relational databases such as PostgreSQL.
- Solid experience in data architecture, including data modelling. Knowledge of different data architectures such as Medallion architecture, Dimensional modelling
- Strong knowledge of cloud infrastructure cost optimization and performance tuning.
- Excellent problem-solving skills and ability to work in a collaborative, agile environment.
Nice to have
- Experience within legal tech or recruiting data domains.
- Familiarity with Content Management Systems (CMS) for managing data sources.
- Knowledge of data privacy, security regulations, and compliance standards.
- Experience with web scraping.
- Experience with EMR and SageMaker.
Required skills experience
| AWS | 6 years |
| Python | 6 years |
| EMR |
Required languages
| English | B2 - Upper Intermediate |