Machine Learning Specialist (Embeddings, Data Deduplication, AI-Powered Search)
Job Title: Machine Learning Specialist (Embeddings, Data Deduplication, AI-Powered Search)
Job Summary:
We are looking for a talented Machine Learning Specialist with expertise in embeddings (text and images), data processing, and AI-driven deduplication. The role involves designing intelligent systems to clean, normalize, and optimize large-scale datasets, improving product discovery and search. This is a fully remote opportunity where you’ll work on cutting-edge ML solutions for real-world retail and enterprise use cases.
Key Responsibilities:
- Build and maintain scalable pipelines for AI-driven deduplication and record linkage across large datasets.
- Develop and fine-tune image and text embedding models for classification, similarity, and search.
- Apply computer vision techniques (image classification, feature extraction, multimodal learning).
- Integrate ML models with relational and non-relational databases (PostgreSQL, MySQL, MongoDB, Redis).
- Apply vector search technologies (e.g., FAISS, Milvus, Pinecone, Weaviate) to power semantic retrieval.
- Research and implement methods for entity resolution, clustering, and anomaly detection.
- Collaborate with data engineers to ensure efficient ETL, preprocessing, and feature engineering.
- Evaluate model performance using precision/recall, ROC-AUC, F1-score, and business KPIs.
- Document experiments and share insights with cross-functional stakeholders.
Must-Have Skills:
- Strong experience with embeddings (text, images, multimodal, or product embeddings).
- Hands-on experience in image classification, image embeddings, and computer vision tasks.
- Proficiency in Python ML stack: PyTorch, TensorFlow, scikit-learn, Hugging Face Transformers.
- Hands-on experience in AI-driven deduplication (fuzzy matching, clustering, record linkage).
- Solid understanding of databases and query optimization.
- Familiarity with vector databases (FAISS, Pinecone, Milvus, etc.).
- Strong problem-solving and analytical skills.
- Fluent English (Upper-Intermediate or higher) for technical discussions and documentation.
Preferred Qualifications:
- Experience with LLM-powered pipelines (RAG, prompt engineering, hybrid search).
- Knowledge of data quality frameworks and large-scale data cleaning.
- Familiarity with cloud ML platforms (AWS Sagemaker, GCP Vertex AI, Azure ML).
- Previous work in retail, e-commerce, or product catalog data is a plus.
Required languages
English | B2 - Upper Intermediate |
📊
$4000-7000
Average salary range of similar jobs in
analytics →
Loading...