Machine Learning Specialist (Embeddings, Data Deduplication, AI-Powered Search)

Job Title: Machine Learning Specialist (Embeddings, Data Deduplication, AI-Powered Search)

 

Job Summary:
We are looking for a talented Machine Learning Specialist with expertise in embeddings (text and images), data processing, and AI-driven deduplication. The role involves designing intelligent systems to clean, normalize, and optimize large-scale datasets, improving product discovery and search. This is a fully remote opportunity where you’ll work on cutting-edge ML solutions for real-world retail and enterprise use cases.

 

Key Responsibilities:

  • Build and maintain scalable pipelines for AI-driven deduplication and record linkage across large datasets.
  • Develop and fine-tune image and text embedding models for classification, similarity, and search.
  • Apply computer vision techniques (image classification, feature extraction, multimodal learning).
  • Integrate ML models with relational and non-relational databases (PostgreSQL, MySQL, MongoDB, Redis).
  • Apply vector search technologies (e.g., FAISS, Milvus, Pinecone, Weaviate) to power semantic retrieval.
  • Research and implement methods for entity resolution, clustering, and anomaly detection.
  • Collaborate with data engineers to ensure efficient ETL, preprocessing, and feature engineering.
  • Evaluate model performance using precision/recall, ROC-AUC, F1-score, and business KPIs.
  • Document experiments and share insights with cross-functional stakeholders.

 

Must-Have Skills:

  • Strong experience with embeddings (text, images, multimodal, or product embeddings).
  • Hands-on experience in image classification, image embeddings, and computer vision tasks.
  • Proficiency in Python ML stack: PyTorch, TensorFlow, scikit-learn, Hugging Face Transformers.
  • Hands-on experience in AI-driven deduplication (fuzzy matching, clustering, record linkage).
  • Solid understanding of databases and query optimization.
  • Familiarity with vector databases (FAISS, Pinecone, Milvus, etc.).
  • Strong problem-solving and analytical skills.
  • Fluent English (Upper-Intermediate or higher) for technical discussions and documentation.

 

Preferred Qualifications:

  • Experience with LLM-powered pipelines (RAG, prompt engineering, hybrid search).
  • Knowledge of data quality frameworks and large-scale data cleaning.
  • Familiarity with cloud ML platforms (AWS Sagemaker, GCP Vertex AI, Azure ML).
  • Previous work in retail, e-commerce, or product catalog data is a plus.

 

 

Required languages

English B2 - Upper Intermediate
Published 19 August
65 views
·
19 applications
90% read
·
74% responded
Last responded 5 days ago
To apply for this and other jobs on Djinni login or signup.
Loading...