Data Science UA

Joined in 2019

Data Science UA is a service company with strong data science and AI expertise.

Our journey began in 2016 with the organization of the first Data Science UA conference, setting the foundation for our growth. Over the past 8 years, we have diligently fostered the largest Data Science Community in Eastern Europe, boasting a network of over 30,000 AI top engineers.

For over 4 years, we have been at the forefront of establishing AI R&D centers in Western Europe, catering to esteemed product companies from the USA and UK.

We offer diverse cooperation models, including outsourcing and outstaffing, where we assemble top-notch tech teams of industry experts to craft optimal solutions tailored to your business requirements.

At Data Science UA, our core focus revolves around AI consulting. Whether you possess a clearly defined request or just a nascent idea, we are eager to collaborate and explore possibilities together.

Additionally, our comprehensive recruiting service extends beyond AI and data science specialists, allowing us to find the best candidate on your request for each level and specialization, bolstering teams worldwide.

Embrace the opportunity to partner with us at Data Science UA, and together, we can achieve extraordinary milestones for your enterprise. Reach out to us today, and let’s embark on this transformative journey hand in hand!

  • Β· 54 views Β· 10 applications Β· 25d

    Computer Vision Lead

    Full Remote Β· Countries of Europe or Ukraine Β· 6 years of experience Β· B2 - Upper Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with the organization of the first Data Science UA conference, setting the foundation for our growth. Over the past 9 years, we have...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with the organization of the first Data Science UA conference, setting the foundation for our growth. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the role:
    We seek an experienced AI/ML Team Leader to join our Client’s startup team. As the Technical Lead, you will start with individual technical contributions and later will take an engineering manager role for the hiring team. You will create cutting-edge end-to-end AI camera solutions, effectively engage with customers and partners to grasp their requirements and ensure project success, overseeing from inception to completion.

    Requirements:
    - Proven leadership experience with a track record of managing and developing technical teams;
    - Excellent customer-facing skills to understand and address client needs effectively;
    - Master's Degree in Computer Science or related field (PhD is a plus);
    - Solid grasp of machine learning and deep learning principles;
    - Strong experience in Computer Vision, including object detection, segmentation, tracking, keypoint/pose estimation;
    - Proven R&D mindset: capable of formulating and validating hypotheses independently, exploring novel approaches, and diving deep into model failures;
    - Proficiency in Python and deep learning frameworks;
    - Practical experience with state-of-the-art models, including different versions of YOLO and Transformer-based architectures (e.g., ViT, DETR, SAM);
    - Expertise in image and video processing using OpenCV;
    - Experience in model training, evaluation, and optimization;
    - Fluent written and verbal communication skills in English.

    Would be a plus:
    - Experience applying ML techniques to embedded or resource-constrained environments (e.g. edge devices, mobile platforms, microcontrollers);
    - Ideally, you have led projects where ML models were optimized, deployed, or fine-tuned for embedded systems, ensuring high performance and low latency under hardware limitations.

    We offer:
    - Free English classes with a native speaker and external courses compensation;
    - PE support by professional accountants;
    - 40 days of PTO;
    - Medical insurance;
    - Team-building events, conferences, meetups, and other activities;
    - There are many other benefits you’ll find out at the interview.

    More
  • Β· 95 views Β· 6 applications Β· 23d

    Data Engineer

    Full Remote Β· Ukraine Β· Product Β· 3 years of experience Β· B1 - Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the client:
    Our client is an IT company that develops technological solutions and products to help companies reach their full potential and meet the needs of their users. The team comprises over 600 specialists in IT and Digital, with solid expertise in various technology stacks necessary for creating complex solutions.

    About the role:
    We are looking for a Data Engineer (NLP-Focused) to build and optimize the data pipelines that fuel the Ukrainian LLM and NLP initiatives. In this role, you will design robust ETL/ELT processes to collect, process, and manage large-scale text and metadata, enabling the Data Scientists and ML Engineers to develop cutting-edge language models.

    You will work at the intersection of data engineering and machine learning, ensuring that the datasets and infrastructure are reliable, scalable, and tailored to the needs of training and evaluating NLP models in a Ukrainian language context.

    Requirements:
    - Education & Experience: 3+ years of experience as a Data Engineer or in a similar role, building data-intensive pipelines or platforms. A Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field is preferred. Experience supporting machine learning or analytics teams with data pipelines is a strong advantage.
    - NLP Domain Experience: Prior experience handling linguistic data or supporting NLP projects (e.g., text normalization, handling different encodings, tokenization strategies). Knowledge of Ukrainian text sources and data sets, or experience with multilingual data processing, can be an advantage given the project’s focus.
    Understanding of FineWeb2 or a similar processing pipeline approach.
    - Data Pipeline Expertise: Hands-on experience designing ETL/ELT processes, including extracting data from various sources, using transformation tools, and loading into storage systems. Proficiency with orchestration frameworks like Apache Airflow for scheduling workflows. Familiarity with building pipelines for unstructured data (text, logs) as well as structured data.
    - Programming & Scripting: Strong programming skills in Python for data manipulation and pipeline development. Experience with NLP packages (spaCy, NLTK, langdetect, fasttext, etc.). Experience with SQL for querying and transforming data in relational databases. Knowledge of Bash or other scripting for automation tasks. Writing clean, maintainable code and using version control (Git) for collaborative development.
    - Databases & Storage: Experience working with relational databases (e.g., PostgreSQL, MySQL), including schema design and query optimization. Familiarity with NoSQL or document stores (e.g., MongoDB) and big data technologies (HDFS, Hive, Spark) for large-scale data is a plus. Understanding of or experience with vector databases (e.g., Pinecone, FAISS) is beneficial, as the NLP applications may require embedding storage and fast similarity search.
    - Cloud Infrastructure: Practical experience with cloud platforms (AWS, GCP, or Azure) for data storage and processing. Ability to set up services such as S3/Cloud Storage, data warehouses (e.g., BigQuery, Redshift), and use cloud-based ETL tools or serverless functions. Understanding of infrastructure-as-code (Terraform, CloudFormation) to manage resources is a plus.
    - Data Quality & Monitoring: Knowledge of data quality assurance practices. Experience implementing monitoring for data pipelines (logs, alerts) and using CI/CD tools to automate pipeline deployment and testing. An analytical mindset to troubleshoot data discrepancies and optimize performance bottlenecks.
    - Collaboration & Domain Knowledge: Ability to work closely with data scientists and understand the requirements of machine learning projects. Basic understanding of NLP concepts and the data needs for training language models, so you can anticipate and accommodate the specific forms of text data and preprocessing they require. Good communication skills to document data workflows and to coordinate with team members across different functions.

    Nice to have:
    - Advanced Tools & Frameworks: Experience with distributed data processing frameworks (such as Apache Spark or Databricks) for large-scale data transformation, and with message streaming systems (Kafka, Pub/Sub) for real-time data pipelines. Familiarity with data serialization formats (JSON, Parquet) and handling of large text corpora.
    - Web Scraping Expertise: Deep experience in web scraping, using tools like Scrapy, Selenium, or Beautiful Soup, and handling anti-scraping challenges (rotating proxies, rate limiting). Ability to parse and clean raw text data from HTML, PDFs, or scanned documents.
    - CI/CD & DevOps: Knowledge of setting up CI/CD pipelines for data engineering (using GitHub Actions, Jenkins, or GitLab CI) to test and deploy changes to data workflows. Experience with containerization (Docker) to package data jobs and with Kubernetes for scaling them is a plus.
    - Big Data & Analytics: Experience with analytics platforms and BI tools (e.g., Tableau, Looker) used to examine the data prepared by the pipelines. Understanding of how to create and manage data warehouses or data marts for analytical consumption.
    - Problem-Solving: Demonstrated ability to work independently in solving complex data engineering problems, optimizing existing pipelines, and implementing new ones under time constraints. A proactive attitude to explore new data tools or techniques that could improve the workflows.

    Responsibilities:
    - Design, develop, and maintain ETL/ELT pipelines for gathering, transforming, and storing large volumes of text data and related information.
    - Ensure pipelines are efficient and can handle data from diverse sources (e.g., web crawls, public datasets, internal databases) while maintaining data integrity.
    - Implement web scraping and data collection services to automate the ingestion of text and linguistic data from the web and other external sources. This includes writing crawlers or using APIs to continuously collect data relevant to the language modeling efforts.
    - Implementation of NLP/LLM-specific data processing: cleaning and normalization of text, like filtering of toxic content, de-duplication, de-noising, detection, and deletion of personal data.
    - Formation of specific SFT/RLHF datasets from existing data, including data augmentation/labeling with LLM as teacher.
    - Set up and manage cloud-based data infrastructure for the project. Configure and maintain data storage solutions (data lakes, warehouses) and processing frameworks (e.g., distributed compute on AWS/GCP/Azure) that can scale with growing data needs.
    - Automate data processing workflows and ensure their scalability and reliability.
    - Use workflow orchestration tools like Apache Airflow to schedule and monitor data pipelines, enabling continuous and repeatable model training and evaluation cycles.
    - Maintain and optimize analytical databases and data access layers for both ad-hoc analysis and model training needs.
    - Work with relational databases (e.g., PostgreSQL) and other storage systems to ensure fast query performance and well-structured data schemas.
    - Collaborate with Data Scientists and NLP Engineers to build data features and datasets for machine learning models.
    - Provide data subsets, aggregations, or preprocessing as needed for tasks such as language model training, embedding generation, and evaluation.
    - Implement data quality checks, monitoring, and alerting. Develop scripts or use tools to validate data completeness and correctness (e.g., ensuring no critical data gaps or anomalies in the text corpora), and promptly address any pipeline failures or data issues. Implement data version control.
    - Manage data security, access, and compliance.
    - Control permissions to datasets and ensure adherence to data privacy policies and security standards, especially when dealing with user data or proprietary text sources.

    The company offers:
    - Competitive salary.
    - Equity options in a fast-growing AI company.
    - Remote-friendly work culture.
    - Opportunity to shape a product at the intersection of AI and human productivity.
    - Work with a passionate, senior team building cutting-edge tech for real-world business use.

    More
  • Β· 42 views Β· 1 application Β· 25d

    Senior/Middle Data Scientist (Data Preparation, Pre-training)

    Full Remote Β· Ukraine Β· Product Β· 3 years of experience Β· B1 - Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the client:
    Our client is an IT company that develops technological solutions and products to help companies reach their full potential and meet the needs of their users. The team comprises over 600 specialists in IT and Digital, with solid expertise in various technology stacks necessary for creating complex solutions.

    About the role:
    We are looking for an experienced Senior/Middle Data Scientist with a passion for Large Language Models (LLMs) and cutting-edge AI research. In this role, you will focus on designing and prototyping data preparation pipelines, collaborating closely with data engineers to transform your prototypes into scalable production pipelines, and actively developing model training pipelines with other talented data scientists. Your work will directly shape the quality and capabilities of the models by ensuring we feed them the highest-quality, most relevant data possible.

    Requirements:
    Education & Experience:
    - 3+ years of experience in Data Science or Machine Learning, preferably with a focus on NLP.
    - Proven experience in data preprocessing, cleaning, and feature engineering for large-scale datasets of unstructured data (text, code, documents, etc.).
    - Advanced degree (Master’s or PhD) in Computer Science, Computational Linguistics, Machine Learning, or a related field is highly preferred.
    NLP Expertise:
    - Good knowledge of natural language processing techniques and algorithms.
    - Hands-on experience with modern NLP approaches, including embedding models, semantic search, text classification, sequence tagging (NER), transformers/LLMs, RAGs.
    - Familiarity with LLM training and fine-tuning techniques.
    ML & Programming Skills:
    - Proficiency in Python and common data science and NLP libraries (pandas, NumPy, scikit-learn, spaCy, NLTK, langdetect, fasttext).
    - Strong experience with deep learning frameworks such as PyTorch or TensorFlow for building NLP models.
    - Ability to write efficient, clean code and debug complex model issues.
    Data & Analytics:
    - Solid understanding of data analytics and statistics.
    - Experience in experimental design, A/B testing, and statistical hypothesis testing to evaluate model performance.
    - Comfortable working with large datasets, writing complex SQL queries, and using data visualization to inform decisions.
    Deployment & Tools:
    - Experience deploying machine learning models in production (e.g., using REST APIs or batch pipelines) and integrating with real-world applications.
    - Familiarity with MLOps concepts and tools (version control for models/data, CI/CD for ML).
    - Experience with cloud platforms (AWS, GCP, or Azure) and big data technologies (Spark, Hadoop, Ray, Dask) for scaling data processing or model training.
    Communication & Personality:
    - Experience working in a collaborative, cross-functional environment.
    - Strong communication skills to convey complex ML results to non-technical stakeholders and to document methodologies clearly.
    - Ability to rapidly prototype and iterate on ideas

    Nice to have:
    Advanced NLP/ML Techniques:
    - Familiarity with evaluation metrics for language models (perplexity, BLEU, ROUGE, etc.) and with techniques for model optimization (quantization, knowledge distillation) to improve efficiency.
    - Understanding of FineWeb2 or similar processing pipelines approach.
    Research & Community:
    - Publications in NLP/ML conferences or contributions to open-source NLP projects.
    - Active participation in the AI community or demonstrated continuous learning (e.g., Kaggle competitions, research collaborations) indicating a passion for staying at the forefront of the field.
    Domain & Language Knowledge:
    - Familiarity with the Ukrainian language and context.
    - Understanding of cultural and linguistic nuances that could inform model training and evaluation in a Ukrainian context.
    - Knowledge of Ukrainian text sources and data sets, or experience with multilingual data processing, can be an advantage given the project’s focus.
    MLOps & Infrastructure:
    - Hands-on experience with containerization (Docker) and orchestration (Kubernetes) for ML, as well as ML workflow tools (MLflow, Airflow).
    - Experience in working alongside MLOps engineers to streamline the deployment and monitoring of NLP models.
    Problem-Solving:
    - Innovative mindset with the ability to approach open-ended AI problems creatively.
    - Comfort in a fast-paced R&D environment where you can adapt to new challenges, propose solutions, and drive them to implementation.

    Responsibilities:
    - Design, prototype, and validate data preparation and transformation steps for LLM training datasets, including cleaning and normalization of text, filtering of toxic content, de-duplication, de-noising, detection and deletion of personal data, etc.
    - Formation of specific SFT/RLHF datasets from existing data, including data augmentation/labeling with LLM as teacher.
    - Analyze large-scale raw text, code, and multimodal data sources for quality, coverage, and relevance.
    - Develop heuristics, filtering rules, and cleaning techniques to maximize training data effectiveness.
    - Collaborate with data engineers to hand over prototypes for automation and scaling.
    - Research and develop best practices and novel techniques in LLM training pipelines.
    - Monitor and evaluate data quality impact on model performance through experiments and benchmarks.
    - Research and implement best practices in large-scale dataset creation for AI/ML models.
    - Document methodologies and share insights with internal teams.

    The company offers:
    - Competitive salary.
    - Equity options in a fast-growing AI company.
    - Remote-friendly work culture.
    - Opportunity to shape a product at the intersection of AI and human productivity.
    - Work with a passionate, senior team building cutting-edge tech for real-world business use.

    More
  • Β· 43 views Β· 2 applications Β· 25d

    Data Engineer (NLP-Focused)

    Full Remote Β· Ukraine Β· Product Β· 3 years of experience Β· B1 - Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the client:
    Our client is an IT company that develops technological solutions and products to help companies reach their full potential and meet the needs of their users. The team comprises over 600 specialists in IT and Digital, with solid expertise in various technology stacks necessary for creating complex solutions.

    About the role:
    We are looking for a Data Engineer (NLP-Focused) to build and optimize the data pipelines that fuel the Ukrainian LLM and NLP initiatives. In this role, you will design robust ETL/ELT processes to collect, process, and manage large-scale text and metadata, enabling the Data Scientists and ML Engineers to develop cutting-edge language models.

    You will work at the intersection of data engineering and machine learning, ensuring that the datasets and infrastructure are reliable, scalable, and tailored to the needs of training and evaluating NLP models in a Ukrainian language context.

    Requirements:
    - Education & Experience: 3+ years of experience as a Data Engineer or in a similar role, building data-intensive pipelines or platforms. A Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field is preferred. Experience supporting machine learning or analytics teams with data pipelines is a strong advantage.
    - NLP Domain Experience: Prior experience handling linguistic data or supporting NLP projects (e.g., text normalization, handling different encodings, tokenization strategies). Knowledge of Ukrainian text sources and data sets, or experience with multilingual data processing, can be an advantage given the project’s focus.
    Understanding of FineWeb2 or a similar processing pipeline approach.
    - Data Pipeline Expertise: Hands-on experience designing ETL/ELT processes, including extracting data from various sources, using transformation tools, and loading into storage systems. Proficiency with orchestration frameworks like Apache Airflow for scheduling workflows. Familiarity with building pipelines for unstructured data (text, logs) as well as structured data.
    - Programming & Scripting: Strong programming skills in Python for data manipulation and pipeline development. Experience with NLP packages (spaCy, NLTK, langdetect, fasttext, etc.). Experience with SQL for querying and transforming data in relational databases. Knowledge of Bash or other scripting for automation tasks. Writing clean, maintainable code and using version control (Git) for collaborative development.
    - Databases & Storage: Experience working with relational databases (e.g., PostgreSQL, MySQL), including schema design and query optimization. Familiarity with NoSQL or document stores (e.g., MongoDB) and big data technologies (HDFS, Hive, Spark) for large-scale data is a plus. Understanding of or experience with vector databases (e.g., Pinecone, FAISS) is beneficial, as the NLP applications may require embedding storage and fast similarity search.
    - Cloud Infrastructure: Practical experience with cloud platforms (AWS, GCP, or Azure) for data storage and processing. Ability to set up services such as S3/Cloud Storage, data warehouses (e.g., BigQuery, Redshift), and use cloud-based ETL tools or serverless functions. Understanding of infrastructure-as-code (Terraform, CloudFormation) to manage resources is a plus.
    - Data Quality & Monitoring: Knowledge of data quality assurance practices. Experience implementing monitoring for data pipelines (logs, alerts) and using CI/CD tools to automate pipeline deployment and testing. An analytical mindset to troubleshoot data discrepancies and optimize performance bottlenecks.
    - Collaboration & Domain Knowledge: Ability to work closely with data scientists and understand the requirements of machine learning projects. Basic understanding of NLP concepts and the data needs for training language models, so you can anticipate and accommodate the specific forms of text data and preprocessing they require. Good communication skills to document data workflows and to coordinate with team members across different functions.

    Responsibilities:
    - Design, develop, and maintain ETL/ELT pipelines for gathering, transforming, and storing large volumes of text data and related information.
    - Ensure pipelines are efficient and can handle data from diverse sources (e.g., web crawls, public datasets, internal databases) while maintaining data integrity.
    - Implement web scraping and data collection services to automate the ingestion of text and linguistic data from the web and other external sources. This includes writing crawlers or using APIs to continuously collect data relevant to the language modeling efforts.
    - Implementation of NLP/LLM-specific data processing: cleaning and normalization of text, like filtering of toxic content, de-duplication, de-noising, detection, and deletion of personal data.
    - Formation of specific SFT/RLHF datasets from existing data, including data augmentation/labeling with LLM as teacher.
    - Set up and manage cloud-based data infrastructure for the project. Configure and maintain data storage solutions (data lakes, warehouses) and processing frameworks (e.g., distributed compute on AWS/GCP/Azure) that can scale with growing data needs.
    - Automate data processing workflows and ensure their scalability and reliability.
    - Use workflow orchestration tools like Apache Airflow to schedule and monitor data pipelines, enabling continuous and repeatable model training and evaluation cycles.
    - Maintain and optimize analytical databases and data access layers for both ad-hoc analysis and model training needs.
    - Work with relational databases (e.g., PostgreSQL) and other storage systems to ensure fast query performance and well-structured data schemas.
    - Collaborate with Data Scientists and NLP Engineers to build data features and datasets for machine learning models.
    - Provide data subsets, aggregations, or preprocessing as needed for tasks such as language model training, embedding generation, and evaluation.
    - Implement data quality checks, monitoring, and alerting. Develop scripts or use tools to validate data completeness and correctness (e.g., ensuring no critical data gaps or anomalies in the text corpora), and promptly address any pipeline failures or data issues. Implement data version control.
    - Manage data security, access, and compliance.
    - Control permissions to datasets and ensure adherence to data privacy policies and security standards, especially when dealing with user data or proprietary text sources.

    The company offers:
    - Competitive salary.
    - Equity options in a fast-growing AI company.
    - Remote-friendly work culture.
    - Opportunity to shape a product at the intersection of AI and human productivity.
    - Work with a passionate, senior team building cutting-edge tech for real-world business use.

    More
  • Β· 44 views Β· 1 application Β· 25d

    Senior/Middle Data Scientist

    Full Remote Β· Ukraine Β· Product Β· 3 years of experience Β· B1 - Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the client:
    Our client is an IT company that develops technological solutions and products to help companies reach their full potential and meet the needs of their users. The team comprises over 600 specialists in IT and Digital, with solid expertise in various technology stacks necessary for creating complex solutions.

    About the role:
    We are looking for an experienced Senior/Middle Data Scientist with a passion for Large Language Models (LLMs) and cutting-edge AI research. In this role, you will focus on designing and prototyping data preparation pipelines, collaborating closely with data engineers to transform your prototypes into scalable production pipelines, and actively developing model training pipelines with other talented data scientists. Your work will directly shape the quality and capabilities of the models by ensuring we feed them the highest-quality, most relevant data possible.

    Requirements:
    Education & Experience:
    - 3+ years of experience in Data Science or Machine Learning, preferably with a focus on NLP.
    - Proven experience in data preprocessing, cleaning, and feature engineering for large-scale datasets of unstructured data (text, code, documents, etc.).
    - Advanced degree (Master’s or PhD) in Computer Science, Computational Linguistics, Machine Learning, or a related field is highly preferred.
    NLP Expertise:
    - Good knowledge of natural language processing techniques and algorithms.
    - Hands-on experience with modern NLP approaches, including embedding models, semantic search, text classification, sequence tagging (NER), transformers/LLMs, RAGs.
    - Familiarity with LLM training and fine-tuning techniques.
    ML & Programming Skills:
    - Proficiency in Python and common data science and NLP libraries (pandas, NumPy, scikit-learn, spaCy, NLTK, langdetect, fasttext).
    - Strong experience with deep learning frameworks such as PyTorch or TensorFlow for building NLP models.
    - Ability to write efficient, clean code and debug complex model issues.
    Data & Analytics:
    - Solid understanding of data analytics and statistics.
    - Experience in experimental design, A/B testing, and statistical hypothesis testing to evaluate model performance.
    - Comfortable working with large datasets, writing complex SQL queries, and using data visualization to inform decisions.
    Deployment & Tools:
    - Experience deploying machine learning models in production (e.g., using REST APIs or batch pipelines) and integrating with real-world applications.
    - Familiarity with MLOps concepts and tools (version control for models/data, CI/CD for ML).
    - Experience with cloud platforms (AWS, GCP, or Azure) and big data technologies (Spark, Hadoop, Ray, Dask) for scaling data processing or model training.
    Communication & Personality:
    - Experience working in a collaborative, cross-functional environment.
    - Strong communication skills to convey complex ML results to non-technical stakeholders and to document methodologies clearly.
    - Ability to rapidly prototype and iterate on ideas

    Nice to have:
    Advanced NLP/ML Techniques:
    - Familiarity with evaluation metrics for language models (perplexity, BLEU, ROUGE, etc.) and with techniques for model optimization (quantization, knowledge distillation) to improve efficiency.
    - Understanding of FineWeb2 or similar processing pipelines approach.
    Research & Community:
    - Publications in NLP/ML conferences or contributions to open-source NLP projects.
    - Active participation in the AI community or demonstrated continuous learning (e.g., Kaggle competitions, research collaborations) indicating a passion for staying at the forefront of the field.
    Domain & Language Knowledge:
    - Familiarity with the Ukrainian language and context.
    - Understanding of cultural and linguistic nuances that could inform model training and evaluation in a Ukrainian context.
    - Knowledge of Ukrainian text sources and data sets, or experience with multilingual data processing, can be an advantage given the project’s focus.
    MLOps & Infrastructure:
    - Hands-on experience with containerization (Docker) and orchestration (Kubernetes) for ML, as well as ML workflow tools (MLflow, Airflow).
    - Experience in working alongside MLOps engineers to streamline the deployment and monitoring of NLP models.
    Problem-Solving:
    - Innovative mindset with the ability to approach open-ended AI problems creatively.
    - Comfort in a fast-paced R&D environment where you can adapt to new challenges, propose solutions, and drive them to implementation.

    Responsibilities:
    - Design, prototype, and validate data preparation and transformation steps for LLM training datasets, including cleaning and normalization of text, filtering of toxic content, de-duplication, de-noising, detection and deletion of personal data, etc.
    - Formation of specific SFT/RLHF datasets from existing data, including data augmentation/labeling with LLM as teacher.
    - Analyze large-scale raw text, code, and multimodal data sources for quality, coverage, and relevance.
    - Develop heuristics, filtering rules, and cleaning techniques to maximize training data effectiveness.
    - Collaborate with data engineers to hand over prototypes for automation and scaling.
    - Research and develop best practices and novel techniques in LLM training pipelines.
    - Monitor and evaluate data quality impact on model performance through experiments and benchmarks.
    - Research and implement best practices in large-scale dataset creation for AI/ML models.
    - Document methodologies and share insights with internal teams.

    The company offers:
    - Competitive salary.
    - Equity options in a fast-growing AI company.
    - Remote-friendly work culture.
    - Opportunity to shape a product at the intersection of AI and human productivity.
    - Work with a passionate, senior team building cutting-edge tech for real-world business use.

    More
  • Β· 67 views Β· 5 applications Β· 4d

    Senior/Middle Data Scientist

    Full Remote Β· Ukraine Β· Product Β· 3 years of experience Β· B1 - Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the client:
    Our client is an IT company that develops technological solutions and products to help companies reach their full potential and meet the needs of their users. The team comprises over 600 specialists in IT and Digital, with solid expertise in various technology stacks necessary for creating complex solutions.

    About the role:
    We are looking for an experienced Senior/Middle Data Scientist with a passion for Large Language Models (LLMs) and cutting-edge AI research. In this role, you will focus on designing and prototyping data preparation pipelines, collaborating closely with data engineers to transform your prototypes into scalable production pipelines, and actively developing model training pipelines with other talented data scientists. Your work will directly shape the quality and capabilities of the models by ensuring we feed them the highest-quality, most relevant data possible.

    Requirements:
    Education & Experience:
    - 3+ years of experience in Data Science or Machine Learning, preferably with a focus on NLP.
    - Proven experience in data preprocessing, cleaning, and feature engineering for large-scale datasets of unstructured data (text, code, documents, etc.).
    - Advanced degree (Master’s or PhD) in Computer Science, Computational Linguistics, Machine Learning, or a related field is highly preferred.
    NLP Expertise:
    - Good knowledge of natural language processing techniques and algorithms.
    - Hands-on experience with modern NLP approaches, including embedding models, semantic search, text classification, sequence tagging (NER), transformers/LLMs, RAGs.
    - Familiarity with LLM training and fine-tuning techniques.
    ML & Programming Skills:
    - Proficiency in Python and common data science and NLP libraries (pandas, NumPy, scikit-learn, spaCy, NLTK, langdetect, fasttext).
    - Strong experience with deep learning frameworks such as PyTorch or TensorFlow for building NLP models.
    - Ability to write efficient, clean code and debug complex model issues.
    Data & Analytics:
    - Solid understanding of data analytics and statistics.
    - Experience in experimental design, A/B testing, and statistical hypothesis testing to evaluate model performance.
    - Comfortable working with large datasets, writing complex SQL queries, and using data visualization to inform decisions.
    Deployment & Tools:
    - Experience deploying machine learning models in production (e.g., using REST APIs or batch pipelines) and integrating with real-world applications.
    - Familiarity with MLOps concepts and tools (version control for models/data, CI/CD for ML).
    - Experience with cloud platforms (AWS, GCP, or Azure) and big data technologies (Spark, Hadoop, Ray, Dask) for scaling data processing or model training.
    Communication & Personality:
    - Experience working in a collaborative, cross-functional environment.
    - Strong communication skills to convey complex ML results to non-technical stakeholders and to document methodologies clearly.
    - Ability to rapidly prototype and iterate on ideas

    Nice to have:
    Advanced NLP/ML Techniques:
    - Familiarity with evaluation metrics for language models (perplexity, BLEU, ROUGE, etc.) and with techniques for model optimization (quantization, knowledge distillation) to improve efficiency.
    - Understanding of FineWeb2 or similar processing pipelines approach.
    Research & Community:
    - Publications in NLP/ML conferences or contributions to open-source NLP projects.
    - Active participation in the AI community or demonstrated continuous learning (e.g., Kaggle competitions, research collaborations) indicating a passion for staying at the forefront of the field.
    Domain & Language Knowledge:
    - Familiarity with the Ukrainian language and context.
    - Understanding of cultural and linguistic nuances that could inform model training and evaluation in a Ukrainian context.
    - Knowledge of Ukrainian text sources and data sets, or experience with multilingual data processing, can be an advantage given the project’s focus.
    MLOps & Infrastructure:
    - Hands-on experience with containerization (Docker) and orchestration (Kubernetes) for ML, as well as ML workflow tools (MLflow, Airflow).
    - Experience in working alongside MLOps engineers to streamline the deployment and monitoring of NLP models.
    Problem-Solving:
    - Innovative mindset with the ability to approach open-ended AI problems creatively.
    - Comfort in a fast-paced R&D environment where you can adapt to new challenges, propose solutions, and drive them to implementation.

    Responsibilities:
    - Design, prototype, and validate data preparation and transformation steps for LLM training datasets, including cleaning and normalization of text, filtering of toxic content, de-duplication, de-noising, detection and deletion of personal data, etc.
    - Formation of specific SFT/RLHF datasets from existing data, including data augmentation/labeling with LLM as teacher.
    - Analyze large-scale raw text, code, and multimodal data sources for quality, coverage, and relevance.
    - Develop heuristics, filtering rules, and cleaning techniques to maximize training data effectiveness.
    - Collaborate with data engineers to hand over prototypes for automation and scaling.
    - Research and develop best practices and novel techniques in LLM training pipelines.
    - Monitor and evaluate data quality impact on model performance through experiments and benchmarks.
    - Research and implement best practices in large-scale dataset creation for AI/ML models.
    - Document methodologies and share insights with internal teams.

    The company offers:
    - Competitive salary.
    - Equity options in a fast-growing AI company.
    - Remote-friendly work culture.
    - Opportunity to shape a product at the intersection of AI and human productivity.
    - Work with a passionate, senior team building cutting-edge tech for real-world business use.

    More
  • Β· 83 views Β· 2 applications Β· 4d

    Data Engineer (NLP-Focused)

    Full Remote Β· Ukraine Β· Product Β· 3 years of experience Β· B1 - Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the client:
    Our client is an IT company that develops technological solutions and products to help companies reach their full potential and meet the needs of their users. The team comprises over 600 specialists in IT and Digital, with solid expertise in various technology stacks necessary for creating complex solutions.

    About the role:
    We are looking for a Data Engineer (NLP-Focused) to build and optimize the data pipelines that fuel the Ukrainian LLM and NLP initiatives. In this role, you will design robust ETL/ELT processes to collect, process, and manage large-scale text and metadata, enabling the Data Scientists and ML Engineers to develop cutting-edge language models.

    You will work at the intersection of data engineering and machine learning, ensuring that the datasets and infrastructure are reliable, scalable, and tailored to the needs of training and evaluating NLP models in a Ukrainian language context.

    Requirements:
    - Education & Experience: 3+ years of experience as a Data Engineer or in a similar role, building data-intensive pipelines or platforms. A Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field is preferred. Experience supporting machine learning or analytics teams with data pipelines is a strong advantage.
    - NLP Domain Experience: Prior experience handling linguistic data or supporting NLP projects (e.g., text normalization, handling different encodings, tokenization strategies). Knowledge of Ukrainian text sources and data sets, or experience with multilingual data processing, can be an advantage given the project’s focus.
    Understanding of FineWeb2 or a similar processing pipeline approach.
    - Data Pipeline Expertise: Hands-on experience designing ETL/ELT processes, including extracting data from various sources, using transformation tools, and loading into storage systems. Proficiency with orchestration frameworks like Apache Airflow for scheduling workflows. Familiarity with building pipelines for unstructured data (text, logs) as well as structured data.
    - Programming & Scripting: Strong programming skills in Python for data manipulation and pipeline development. Experience with NLP packages (spaCy, NLTK, langdetect, fasttext, etc.). Experience with SQL for querying and transforming data in relational databases. Knowledge of Bash or other scripting for automation tasks. Writing clean, maintainable code and using version control (Git) for collaborative development.
    - Databases & Storage: Experience working with relational databases (e.g., PostgreSQL, MySQL), including schema design and query optimization. Familiarity with NoSQL or document stores (e.g., MongoDB) and big data technologies (HDFS, Hive, Spark) for large-scale data is a plus. Understanding of or experience with vector databases (e.g., Pinecone, FAISS) is beneficial, as the NLP applications may require embedding storage and fast similarity search.
    - Cloud Infrastructure: Practical experience with cloud platforms (AWS, GCP, or Azure) for data storage and processing. Ability to set up services such as S3/Cloud Storage, data warehouses (e.g., BigQuery, Redshift), and use cloud-based ETL tools or serverless functions. Understanding of infrastructure-as-code (Terraform, CloudFormation) to manage resources is a plus.
    - Data Quality & Monitoring: Knowledge of data quality assurance practices. Experience implementing monitoring for data pipelines (logs, alerts) and using CI/CD tools to automate pipeline deployment and testing. An analytical mindset to troubleshoot data discrepancies and optimize performance bottlenecks.
    - Collaboration & Domain Knowledge: Ability to work closely with data scientists and understand the requirements of machine learning projects. Basic understanding of NLP concepts and the data needs for training language models, so you can anticipate and accommodate the specific forms of text data and preprocessing they require. Good communication skills to document data workflows and to coordinate with team members across different functions.

    Responsibilities:
    - Design, develop, and maintain ETL/ELT pipelines for gathering, transforming, and storing large volumes of text data and related information.
    - Ensure pipelines are efficient and can handle data from diverse sources (e.g., web crawls, public datasets, internal databases) while maintaining data integrity.
    - Implement web scraping and data collection services to automate the ingestion of text and linguistic data from the web and other external sources. This includes writing crawlers or using APIs to continuously collect data relevant to the language modeling efforts.
    - Implementation of NLP/LLM-specific data processing: cleaning and normalization of text, like filtering of toxic content, de-duplication, de-noising, detection, and deletion of personal data.
    - Formation of specific SFT/RLHF datasets from existing data, including data augmentation/labeling with LLM as teacher.
    - Set up and manage cloud-based data infrastructure for the project. Configure and maintain data storage solutions (data lakes, warehouses) and processing frameworks (e.g., distributed compute on AWS/GCP/Azure) that can scale with growing data needs.
    - Automate data processing workflows and ensure their scalability and reliability.
    - Use workflow orchestration tools like Apache Airflow to schedule and monitor data pipelines, enabling continuous and repeatable model training and evaluation cycles.
    - Maintain and optimize analytical databases and data access layers for both ad-hoc analysis and model training needs.
    - Work with relational databases (e.g., PostgreSQL) and other storage systems to ensure fast query performance and well-structured data schemas.
    - Collaborate with Data Scientists and NLP Engineers to build data features and datasets for machine learning models.
    - Provide data subsets, aggregations, or preprocessing as needed for tasks such as language model training, embedding generation, and evaluation.
    - Implement data quality checks, monitoring, and alerting. Develop scripts or use tools to validate data completeness and correctness (e.g., ensuring no critical data gaps or anomalies in the text corpora), and promptly address any pipeline failures or data issues. Implement data version control.
    - Manage data security, access, and compliance.
    - Control permissions to datasets and ensure adherence to data privacy policies and security standards, especially when dealing with user data or proprietary text sources.

    The company offers:
    - Competitive salary.
    - Equity options in a fast-growing AI company.
    - Remote-friendly work culture.
    - Opportunity to shape a product at the intersection of AI and human productivity.
    - Work with a passionate, senior team building cutting-edge tech for real-world business use.

    More
  • Β· 64 views Β· 2 applications Β· 4d

    Senior/Middle Data Scientist (Data Preparation, Pre-training)

    Full Remote Β· Ukraine Β· Product Β· 3 years of experience Β· B1 - Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the client:
    Our client is an IT company that develops technological solutions and products to help companies reach their full potential and meet the needs of their users. The team comprises over 600 specialists in IT and Digital, with solid expertise in various technology stacks necessary for creating complex solutions.

    About the role:
    We are looking for an experienced Senior/Middle Data Scientist with a passion for Large Language Models (LLMs) and cutting-edge AI research. In this role, you will focus on designing and prototyping data preparation pipelines, collaborating closely with data engineers to transform your prototypes into scalable production pipelines, and actively developing model training pipelines with other talented data scientists. Your work will directly shape the quality and capabilities of the models by ensuring we feed them the highest-quality, most relevant data possible.

    Requirements:
    Education & Experience:
    - 3+ years of experience in Data Science or Machine Learning, preferably with a focus on NLP.
    - Proven experience in data preprocessing, cleaning, and feature engineering for large-scale datasets of unstructured data (text, code, documents, etc.).
    - Advanced degree (Master’s or PhD) in Computer Science, Computational Linguistics, Machine Learning, or a related field is highly preferred.
    NLP Expertise:
    - Good knowledge of natural language processing techniques and algorithms.
    - Hands-on experience with modern NLP approaches, including embedding models, semantic search, text classification, sequence tagging (NER), transformers/LLMs, RAGs.
    - Familiarity with LLM training and fine-tuning techniques.
    ML & Programming Skills:
    - Proficiency in Python and common data science and NLP libraries (pandas, NumPy, scikit-learn, spaCy, NLTK, langdetect, fasttext).
    - Strong experience with deep learning frameworks such as PyTorch or TensorFlow for building NLP models.
    - Ability to write efficient, clean code and debug complex model issues.
    Data & Analytics:
    - Solid understanding of data analytics and statistics.
    - Experience in experimental design, A/B testing, and statistical hypothesis testing to evaluate model performance.
    - Comfortable working with large datasets, writing complex SQL queries, and using data visualization to inform decisions.
    Deployment & Tools:
    - Experience deploying machine learning models in production (e.g., using REST APIs or batch pipelines) and integrating with real-world applications.
    - Familiarity with MLOps concepts and tools (version control for models/data, CI/CD for ML).
    - Experience with cloud platforms (AWS, GCP, or Azure) and big data technologies (Spark, Hadoop, Ray, Dask) for scaling data processing or model training.
    Communication & Personality:
    - Experience working in a collaborative, cross-functional environment.
    - Strong communication skills to convey complex ML results to non-technical stakeholders and to document methodologies clearly.
    - Ability to rapidly prototype and iterate on ideas

    Nice to have:
    Advanced NLP/ML Techniques:
    - Familiarity with evaluation metrics for language models (perplexity, BLEU, ROUGE, etc.) and with techniques for model optimization (quantization, knowledge distillation) to improve efficiency.
    - Understanding of FineWeb2 or similar processing pipelines approach.
    Research & Community:
    - Publications in NLP/ML conferences or contributions to open-source NLP projects.
    - Active participation in the AI community or demonstrated continuous learning (e.g., Kaggle competitions, research collaborations) indicating a passion for staying at the forefront of the field.
    Domain & Language Knowledge:
    - Familiarity with the Ukrainian language and context.
    - Understanding of cultural and linguistic nuances that could inform model training and evaluation in a Ukrainian context.
    - Knowledge of Ukrainian text sources and data sets, or experience with multilingual data processing, can be an advantage given the project’s focus.
    MLOps & Infrastructure:
    - Hands-on experience with containerization (Docker) and orchestration (Kubernetes) for ML, as well as ML workflow tools (MLflow, Airflow).
    - Experience in working alongside MLOps engineers to streamline the deployment and monitoring of NLP models.
    Problem-Solving:
    - Innovative mindset with the ability to approach open-ended AI problems creatively.
    - Comfort in a fast-paced R&D environment where you can adapt to new challenges, propose solutions, and drive them to implementation.

    Responsibilities:
    - Design, prototype, and validate data preparation and transformation steps for LLM training datasets, including cleaning and normalization of text, filtering of toxic content, de-duplication, de-noising, detection and deletion of personal data, etc.
    - Formation of specific SFT/RLHF datasets from existing data, including data augmentation/labeling with LLM as teacher.
    - Analyze large-scale raw text, code, and multimodal data sources for quality, coverage, and relevance.
    - Develop heuristics, filtering rules, and cleaning techniques to maximize training data effectiveness.
    - Collaborate with data engineers to hand over prototypes for automation and scaling.
    - Research and develop best practices and novel techniques in LLM training pipelines.
    - Monitor and evaluate data quality impact on model performance through experiments and benchmarks.
    - Research and implement best practices in large-scale dataset creation for AI/ML models.
    - Document methodologies and share insights with internal teams.

    The company offers:
    - Competitive salary.
    - Equity options in a fast-growing AI company.
    - Remote-friendly work culture.
    - Opportunity to shape a product at the intersection of AI and human productivity.
    - Work with a passionate, senior team building cutting-edge tech for real-world business use.

    More
  • Β· 19 views Β· 0 applications Β· 4d

    Senior Data Scientist/NLP Lead

    Office Work Β· Ukraine (Kyiv) Β· Product Β· 5 years of experience Β· B2 - Upper Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.
     

    About the client:
    Our client is an IT company that develops technological solutions and products to help companies reach their full potential and meet the needs of their users. The team comprises over 600 specialists in IT and Digital, with solid expertise in various technology stacks necessary for creating complex solutions.
     

    About the role:
    We are looking for an experienced Senior Data Scientist / NLP Lead to spearhead the development of cutting-edge natural language processing solutions for the Ukrainian LLM project. You will lead the NLP team in designing, implementing, and deploying large-scale language models and NLP algorithms that power the products.

    This role is critical to the mission of advancing AI in the Ukrainian language context, and offers the opportunity to drive technical decisions, mentor a team of data scientists, and shape the future of AI capabilities in Ukraine.
     

    Requirements:
    Education & Experience:
    - 5+ years of experience in data science or machine learning, with a strong focus on NLP.
    - Proven track record of developing and deploying NLP or ML models at scale in production environments.
    - An advanced degree (Master’s or PhD) in Computer Science, Computational Linguistics, Machine Learning, or a related field is highly preferred.

    NLP Expertise:
    - Deep understanding of natural language processing techniques and algorithms.
    - Hands-on experience with modern NLP approaches, including embedding models, text classification, sequence tagging (NER), and transformers/LLMs.
    - Deep understanding of transformer architectures and knowledge of LLM training and fine-tuning techniques, hands-on experience developing solutions on LLM, and knowledge of linguistic nuances in Ukrainian or other languages.

    Advanced NLP/ML Techniques:
    - Experience with evaluation metrics for language models (perplexity, BLEU, ROUGE, etc.) and with techniques for model optimization (quantization, knowledge distillation) to improve efficiency.
    - Background in information retrieval or RAG (Retrieval-Augmented Generation) is a plus for building systems that augment LLMs with external knowledge.
    ML & Programming Skills:
    - Proficiency in Python and common data science libraries (pandas, NumPy, scikit-learn).
    - Strong experience with deep learning frameworks such as PyTorch or TensorFlow for building NLP models.
    - Ability to write efficient, clean code and debug complex model issues.

    Data & Analytics:
    - Solid understanding of data analytics and statistics.
    - Experience in experimental design, A/B testing, and statistical hypothesis testing to evaluate model performance.
    - Experience on how to build a representative benchmarking framework given business requirements for LLM.
    - Comfortable working with large datasets, writing complex SQL queries, and using data visualization to inform decisions.

    Deployment & Tools:
    - Experience deploying machine learning models in production (e.g., using REST APIs or batch pipelines) and integrating with real-world applications.
    - Familiarity with MLOps concepts and tools (version control for models/data, CI/CD for ML).
    - Experience with cloud platforms (AWS, GCP or Azure) and big data technologies (Spark, Hadoop) for scaling data processing or model training is a plus.
    - Hands-on experience with containerization (Docker) and orchestration (Kubernetes) for ML, as well as ML workflow tools (MLflow, Airflow).
    Leadership & Communication:
    - Demonstrated ability to lead technical projects and mentor junior team members.
    - Strong communication skills to convey complex ML results to non-technical stakeholders and to document methodologies clearly.

     

    Responsibilities:
    - Lead end-to-end development of NLP and LLM models - from data exploration and model prototyping to validation and production deployment. This includes designing novel model architectures or fine-tuning state-of-the-art transformer models (e.g., BERT, GPT) to solve project-specific language tasks.
    - Analyze large text datasets (Ukrainian and multilingual corpora) to extract insights and build robust training datasets.
    - Guide data collection and annotation efforts to ensure high-quality data for model training.
    - Develop and implement NLP algorithms for a range of tasks such as text classification, named entity recognition, semantic search, and conversational AI.
    - Stay up-to-date with the latest research to apply transformer-based models, embeddings, and other modern NLP techniques in the solutions.
    - Establish evaluation metrics and validation frameworks for model performance, including accuracy, factuality, and bias.
    - Design A/B tests and statistical experiments to compare model variants and validate improvements.
    - Deploy and integrate NLP models into production systems in collaboration with engineers - ensuring models are scalable, efficient, and well-monitored in a real-world setting.
    - Optimize model inference and troubleshoot issues such as model drift or data pipeline bottlenecks.
    - Provide technical leadership and mentorship to the NLP/ML team.
    - Review code and research, uphold best practices in ML (version control, reproducibility, documentation), and foster a culture of continuous learning and innovation.
    - Collaborate cross-functionally with product managers, software engineers, and MLOps engineers to align NLP solutions with product goals and infrastructure capabilities.
    - Communicate complex data science concepts to stakeholders and incorporate their feedback into model development.

     

    The company offers:
    - Competitive salary.
    - Equity options in a fast-growing AI company.
    - Remote-friendly work culture.
    - Opportunity to shape a product at the intersection of AI and human productivity.
    - Work with a passionate, senior team building cutting-edge tech for real-world business use.

    More
  • Β· 42 views Β· 2 applications Β· 25d

    AI Datasets Lead/Data Annotation Lead

    Hybrid Remote Β· Ukraine (Kyiv) Β· Product Β· 3 years of experience Β· B2 - Upper Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with the organization of the first Data Science UA conference, setting the foundation for our growth. Over the past 8 years, we have...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with the organization of the first Data Science UA conference, setting the foundation for our growth. Over the past 8 years, we have diligently fostered the largest Data Science Community in Eastern Europe, boasting a network of over 30,000 AI top engineers

    About the client and the role:
    Our client is a software company that develops and distributes software for macOS and iOS. We are seeking an AI Datasets Lead (data generation & annotation) to join the AI team, overseeing the entire data annotation process and supporting AI and machine learning initiatives. This role requires strong leadership skills combined with a good understanding of data collection and annotation for machine learning models.

     

    Requirements:

    • Proven experience leading a data annotation team
    • Basic understanding of AI/ML concepts (data annotation needs, task types, neural network training)
    • Good knowledge of data labeling processes and familiarity with relevant tools, platforms, and best practices in the data annotation field
    • Basic understanding of data collection processes (web scraping, dataset management)
    • Strong leadership, communication, and organizational skills with the ability to convey technical aspects to both technical and non-technical stakeholders
    • Strong ability to manage uncertainty and bring clarity to unstructured situations β€” handle incoming requests without predefined processes, structure the chaos, and progressively build clear, actionable workflows for the team
    • Experience with quality assurance processes for large datasets
    • Keen attention to detail and the ability to balance multiple projects and priorities simultaneously
    • Openness to frequent feedback from multiple stakeholders and readiness to quickly adapt processes in real time based on that input
    • Experience with data privacy regulations and ethical considerations in data annotation
    • Strong analytical skills to monitor processes and implement improvements based on data-driven insights
    • Ability to develop and implement data quality metrics and KPIs to measure the effectiveness of annotation processes
    • Experience managing budgets & external partnerships
    • Upper-intermediate level of English.

     

    Responsibilities:

    • Organize and supervise the data annotation process, ensuring its accuracy and quality
    • Build and lead the data annotation team, coordinate team tasks and workflows, and foster the team development
    • Implement quality control processes, analyze errors, and improve standards
    • Develop and maintain annotation guidelines and documentation to ensure consistency and accuracy across the team
    • Optimize annotation processes and explore automation opportunities
    • Closely collaborate with data engineers to develop efficient data processing solutions and tools, and implement automated data processing workflows to streamline annotation processes
    • Collaborate closely with ML Engineers to align data annotation efforts with machine learning model requirements and understand the neural network training process
    • Take part in cross-functional coordination and cooperation with other colleagues
    • Manage costs related to data sourcing and annotation
    • Explore outsourcing & alternative data solutions, manage cooperation with external freelancers and partners, including searching, selecting, and organizing cooperation with them to meet the organizational needs
    • Ensure compliance with data privacy and security regulations throughout the data annotation process


    The company offers:

    • Hybrid work model
    • Health and mental programs
    • Flexible working hours
    • Time-off policy
    • Team and growth
    • Different social initiatives.
    More
  • Β· 119 views Β· 36 applications Β· 25d

    Senior Full Stack Engineer

    Full Remote Β· Countries of Europe or Ukraine Β· Product Β· 5 years of experience Β· B2 - Upper Intermediate
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of...

    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the client:
    Join the fast-moving remote team building AI-driven financial tools for the U.S. auto industry.
    The product is oriented on the B2B segment, specifically on Sales and Marketing Managers in the US Automotive niche. The goal of the product is to eliminate existing data lack providing modern UI/UX experience.

    Duration: 6+ months (and ongoing)
     

    Tech Stack:

    • Frontend: React, Elastic UI Framework (replaced with Microsoft Fluent UI)
    • Backend: Nest, Service Architecture, Redis, Mongo
    • Tools & Processes: Jira/Confluence/Slack/Cursos

     

    What You’ll Be Doing:

    • Build and maintain user-facing features with high performance and responsiveness.
    • Collaborate with the product manager, QA engineers, and designers to deliver intuitive, impactful functionality.
    • Contribute to architectural decisions and ensure clean, maintainable code across the stack.
    • Optimize the app for maximum speed, performance, and scalability.
    • Participate in code reviews, live coding sessions, and technical planning.
    • Work independently and drive solutions end-to-end with minimal supervision.

     

    What We’re Looking For:

    1. 6+ years of experience in software engineering with
      1. a strong focus and 4+ years on Node/Next/Nest backend development.
      2. 2+ years of React frontend experience
    2. Solid knowledge of JavaScript/TypeScript and frameworks (React is must, near pixel perfect lay-outing).
    3. Experience with backend technologies (Node.js, Express, Nest).
    4. Understanding of CI/CD pipelines and cloud services (AWS preferred).
    5. Clear and confident English communication skills (B2+ level).
    6. Proactive mindset, problem-solving attitude, and ability to work autonomously (must).
    7. Playwright usage for end-to-end tests is the great plus.

     

    Hiring Process:

    1. Tech Screening (30 mins)
    2. Live Coding Session (45–60 mins)
    3. Offer Call (15 mins)
    More
  • Β· 113 views Β· 13 applications Β· 15d

    Middle Python Developer

    Full Remote Β· Ukraine Β· Product Β· 3 years of experience Β· B2 - Upper Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with the uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with the uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the client:
    Our client is a quantitative digital assets market-neutral fund specializing in algorithmic trading across centralized and decentralized markets. The company designs, builds, and runs high-performance systems that power 24/7 trading strategies at scale. The edge is in combining advanced quantitative research with world-class engineering to capture opportunities in the rapidly evolving digital asset markets.

    About the role:
    We are looking for a Middle Python Developer with 3–4 years of experience in building and optimizing high-load, low-latency systems. This is a hands-on engineering role where you will enhance the execution platform and market data infrastructure, design and implement connectors for new centralized and decentralized exchanges, perform latency profiling and code optimization to achieve maximum throughput and minimal jitter, work with large-scale data pipelines to support real-time and historical analysis.

    Requirements:
    - 3–4 years of professional Python development in high-performance, high-load environments.
    - Expert in concurrency and parallel programming (asyncio, multiprocessing, threading, Dask).
    - Proficiency with scientific computing libraries: NumPy, Pandas, SciPy.
    - Advanced skills in SQL and NoSQL databases;
    - Deep expertise with ClickHouse or equivalent high-volume solutions.
    - Strong knowledge of AWS (compute, storage, streaming, orchestration).
    - Strong knowledge of messaging brokers (Kafka, RabbitMQ, ZeroMQ).
    - Solid foundation in algorithms, data structures, and system design.
    - Comfortable with Linux environments, Docker, Git.

    Nice to have:
    - Knowledge of Go for latency-critical modules.
    - Experience with C++/Rust for ultra-low-latency systems.
    - Familiarity with time-series analysis, quantitative finance, or crypto trading.
    - Exposure to profiling and debugging tools (perf, flamegraphs, cProfile, PySpy).

    Responsibilities:
    - Develop and optimize Python-based trading infrastructure for high-performance execution.
    - Implement and tune parallel and concurrent systems, using frameworks such as multiprocessing, threading, asyncio, aiohttp, Dask, joblib.
    - Build and maintain data pipelines using NumPy, Pandas, SciPy, with storage in SQL and NoSQL databases (especially ClickHouse, PostgreSQL, Redis).
    - Architect and scale systems on AWS, leveraging EC2, S3, Lambda, and related services.
    - Design and integrate with messaging brokers (Kafka, RabbitMQ, ZeroMQ) for distributed trading workflows.
    - Perform latency analysis, profiling, and performance optimization across trading components.
    - Contribute to CI/CD, testing, and best practices in code quality.

    The company offers:
    - Directly shape the core trading infrastructure of a high-performing crypto hedge fund.
    - Flat, high-impact environment where engineering directly drives P&L.
    - Access to cutting-edge problems in latency optimization and distributed systems.

    More
  • Β· 74 views Β· 2 applications Β· 9d

    Python Developer

    Full Remote Β· Ukraine Β· Product Β· 3 years of experience Β· B2 - Upper Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with the uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with the uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the client:
    Our client is a quantitative digital assets market-neutral fund specializing in algorithmic trading across centralized and decentralized markets. The company designs, builds, and runs high-performance systems that power 24/7 trading strategies at scale. The edge is in combining advanced quantitative research with world-class engineering to capture opportunities in the rapidly evolving digital asset markets.

    About the role:
    We are looking for a Middle Python Developer with 3–4 years of experience in building and optimizing high-load, low-latency systems. This is a hands-on engineering role where you will enhance the execution platform and market data infrastructure, design and implement connectors for new centralized and decentralized exchanges, perform latency profiling and code optimization to achieve maximum throughput and minimal jitter, work with large-scale data pipelines to support real-time and historical analysis.

    Requirements:
    - 3–4 years of professional Python development in high-performance, high-load environments.
    - Expert in concurrency and parallel programming (asyncio, multiprocessing, threading, Dask).
    - Proficiency with scientific computing libraries: NumPy, Pandas, SciPy.
    - Advanced skills in SQL and NoSQL databases;
    - Deep expertise with ClickHouse or equivalent high-volume solutions.
    - Strong knowledge of AWS (compute, storage, streaming, orchestration).
    - Strong knowledge of messaging brokers (Kafka, RabbitMQ, ZeroMQ).
    - Solid foundation in algorithms, data structures, and system design.
    - Comfortable with Linux environments, Docker, Git.

    Nice to have:
    - Knowledge of Go for latency-critical modules.
    - Experience with C++/Rust for ultra-low-latency systems.
    - Familiarity with time-series analysis, quantitative finance, or crypto trading.
    - Exposure to profiling and debugging tools (perf, flamegraphs, cProfile, PySpy).

    Responsibilities:
    - Develop and optimize Python-based trading infrastructure for high-performance execution.
    - Implement and tune parallel and concurrent systems, using frameworks such as multiprocessing, threading, asyncio, aiohttp, Dask, joblib.
    - Build and maintain data pipelines using NumPy, Pandas, SciPy, with storage in SQL and NoSQL databases (especially ClickHouse, PostgreSQL, Redis).
    - Architect and scale systems on AWS, leveraging EC2, S3, Lambda, and related services.
    - Design and integrate with messaging brokers (Kafka, RabbitMQ, ZeroMQ) for distributed trading workflows.
    - Perform latency analysis, profiling, and performance optimization across trading components.
    - Contribute to CI/CD, testing, and best practices in code quality.

    The company offers:
    - Directly shape the core trading infrastructure of a high-performing crypto hedge fund.
    - Flat, high-impact environment where engineering directly drives P&L.
    - Access to cutting-edge problems in latency optimization and distributed systems.

    More
  • Β· 10 views Β· 0 applications Β· 9d

    MLOps Team Lead

    Hybrid Remote Β· Ukraine (Kyiv), Spain Β· 4 years of experience Β· B2 - Upper Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the client:
    Our client is a company where high-growth startups turn when they need to move faster, scale smarter, and make the most of the cloud. As an AWS Premier Partner and Strategic Partner, the company delivers hands-on DevOps, FinOps, and GenAI support that drives real results. The company works across EMEA, fueling innovation and solving complex challenges daily.

    About the role:
    We’re hiring an MLOps Team Lead to join the innovative team. You will lead a group of highly skilled MLOps Engineers, develop robust MLOps pipelines, create cutting-edge Generative AI solutions, effectively engage with customers to grasp their requirements, and oversee project success from inception to completion.

    Requirements:
    - Proven leadership experience with a track record of managing and developing technical teams.
    - Excellent customer-facing skills to understand and address client needs effectively.
    - Ability to design and implement cloud solutions and to build MLOps pipelines on AWS.
    - Experience with one or more MLOps frameworks like Kubeflow, MLFlow, DataRobot, Airflow, etc.
    - Fluency in Python, a good understanding of Linux, and knowledge of frameworks such as scikit-learn, Keras, PyTorch, Tensorflow, etc.
    - Ability to understand tools used by data scientists and experience with software development and test automation.
    - Experience with one or more large language models, such as frameworks like OpenAI SDK, Amazon Bedrock, LangChain, and LlamaIndex.
    - Experience with Docker.
    - Fluent in written and verbal communication skills in English.

    Nice to have:
    - Working knowledge of some Vector Databases such as OpenSearch, Qdrant, Weaviate, LanceDB, etc.
    - Working knowledge of Snowflake, BigQuery, and/or Databricks.
    - GCP or Azure knowledge (DevOps/MLOps).
    - ML certification (AWS ML Specialty or similar).

    Responsibilities:
    - Manage a team of highly professional MLOps Engineers focusing on their growth and execution excellence.
    - Lead and be responsible for on-time, great-quality project deliveries.
    - Developing MLOps pipelines leveraging the Amazon SageMaker platform and its features.
    - Developing Generative AI solutions and POCs, utilizing the latest architectures and technologies.
    - Delivering end-to-end ML products - model performance development, training, validation, testing, and version control.
    - Provision of ML AWS resources and infrastructure.
    - Developing ML-oriented CI/CD pipelines using GitHub Actions, BitBucket, or similar tools.
    - Deploying Machine Learning models in production.
    - Help customers tackle challenges on a scale using distributed training frameworks.
    - Using and writing Terraform libraries for infrastructure deployment.
    - Developing CI/CD pipelines for projects of various scales and tech stacks.
    - Maintaining infrastructures and environments of all types, from dev to production.
    - Security monitoring and administration.

    The company offers:
    - Professional training and certifications covered by the company (AWS, FinOps, Kubernetes, etc.).
    - International work environment.
    - Referral program – enjoy cooperation with your colleagues and get a bonus.
    - Company events and social gatherings (happy hours, team events, knowledge sharing, etc.).
    - English classes.
    - Soft skills training.

    More
  • Β· 27 views Β· 1 application Β· 8d

    Full Stack AI Engineer

    Full Remote Β· Ukraine Β· Product Β· 7 years of experience Β· B2 - Upper Intermediate
    About us: Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently...

    About us:
    Data Science UA is a service company with strong data science and AI expertise. Our journey began in 2016 with uniting top AI talents and organizing the first Data Science tech conference in Kyiv. Over the past 9 years, we have diligently fostered one of the largest Data Science & AI communities in Europe.

    About the client:
    Our client's mission is to reduce the time and money enterprises spend on state-of-the-art technology by 99%. It is the core constraint that drives the product philosophy. It forces the company to reject incremental improvements and pursue fundamental breakthroughs. These principles are the operating system for how the company thinks, builds, and ships. They are how the company turns radical ambition into reality.

    About the role:
    We are looking for a Full Stack AI Engineer who will join the team. You'll be an architect of the revolution against enterprise tech waste. You'll build the core platform and intelligent agents that power the company's mission β€” not optimizing existing workflows, but eliminating them entirely.

    Requirements:
    - 5+ years of industry experience in product-centric full-stack engineering roles;
    - Very strong software engineering background with deep expertise in Python/Typescript.
    - Hands-on LLM product patterns (prompting, tool use, routing, evals, regression testing);
    Experience of working with Agents & LLMs:
    - LangGraph/LangChain (know its limitations & trade-offs);
    - Multi-provider routing, prompt tooling, trace/eval discipline (LangSmith);


    Experience of working with Knowledge & Search Technologies:
    - Vector β€” pgvector on PostgreSQL (semantic);
    - Graph traversal β€” Cypher on Neo4j (relationships & enterprise context graph);
    - Keyword/BM25 β€” lexical precision;
    - Cross-Encoder (BGE Reranker) + RRF (blend lists robustly);
    Deep knowledge of working with Data & Databases:
    - PostgreSQL (Cloud SQL) β€” primary OLTP;
    - SQLAlchemy and BigQuery β€” analytics/outcomes & cost/waste telemetry;
    - Redis (Memorystore) β€” caching + lightweight queue/broker;
    - GCS β€” object storage (files, audio, artifacts);



    Responsibilities:
    Design & Build Autonomous Agents:
    - Tackle the end-to-end challenge of creating agents that solve open-ended enterprise problems.
    - Wrangling high-volume enterprise data, building robust evaluation frameworks.
    - Mastering the interplay between LLM prompting, fine-tuning, and scalable system architecture.
    - Lead Customer Conversations & Translate to Product: Work directly with customer executives to translate their most critical objectives into breakthrough product features.
    - Build the "New": Own the most audacious hypotheses, like fully autonomous optimization agents that replace entire consulting engagements and obsolete the need for army-style deployments.
    - Master Deep Context: Build systems that understand enterprise architecture patterns better than most architects, automatically discovering waste and optimization opportunities.
    - Take single-threaded ownership of core AI product areas, with end-to-end responsibility for customer success and business outcomes.
    - Double as a Forward Deployed Engineer: Work regularly onsite with customers, and actually work with the systems.
    - Every customer engagement becomes R&D for the platform, turning individual pain points into generalizable breakthroughs that serve the entire customer base

    More
Log In or Sign Up to see all posted jobs