IOIX Ukraine

Joined in 2019

100% answers

IOIX Ukraine

Web Crawling/scraping Engineer (5+), Google, Twitter platforms to $5000

Part-time · Full Remote · Ukraine · Product · 5 years of experience · Intermediate
Bypassing Google CAPTCHA and Twitter's hidden bans for data collection through crawling the search results on these platforms. Preference will be given to candidates with ready-made solutions who can demonstrate their work. Respond to this job offer...
Bypassing Google CAPTCHA and Twitter's hidden bans for data collection through crawling the search results on these platforms.

Preference will be given to candidates with ready-made solutions who can demonstrate their work. Respond to this job offer only if you comply.

Work Experience: Minimum of 5 years of experience in web crawling and automation.

Key responsibilities include collecting statistics on various types of blocks and developing automated tests to check the responses of websites. The specialist also works on automating data structure management, such as user profiles.

An important part of the job is simulating user behavior on social networks, such as Google and other sites, including the use of artificial intelligence. This specialist should not just be a consultant-analyst but also possess the ability to independently gather and analyze data, rather than just requesting prepared information about the frequency of CAPTCHA blocks and the types of CAPTCHAs shown to different users.

Candidate Requirements:

Experience:
- Proven experience in developing web scrapers and crawlers for collecting data from complex web applications (5+ years).
- Deep understanding of the principles of HTTP, HTML, CSS, and JavaScript.
- Experience with HTML parsing tools (Beautiful Soup, lxml, etc.).
- Experience with browser automation tools (Selenium, Puppeteer).
- Experience in bypassing blocks and CAPTCHAs.
- Experience with proxy servers and VPNs.
Skills:
- Ability to analyze the structure of web pages and APIs.
- Ability to develop effective and reliable blocking bypass algorithms.
- Ability to work with asynchronous requests.
- Ability to write clean, maintainable, and well-documented code.
IOIX Ukraine

LLM QA Engineer / Analyst to $2500

Part-time · Full Remote · Ukraine · Product · 3 years of experience · Intermediate
We're looking for LLM QA Engineer/Analyst for analyzing and improving the quality of LLM results. This role involves collaborating with LLM engineers to enhance overall performance. QA LLM engineer has basic knowledge of testing, training, and...
We're looking for LLM QA Engineer/Analyst for analyzing and improving the quality of LLM results. This role involves collaborating with LLM engineers to enhance overall performance.

QA LLM engineer has basic knowledge of testing, training, and fine-tuning language models, make immediate assessment of results and necessary adjustments, helping identify the most suitable model for our needs in terms of quality and performance.

There are many good pre-trained models with good performance.

The work strategy for LLM QA involves:
- conducting quick tests,
- making comparisons,
- selecting the best pre-trained model for fine-tuning,
- using advanced LLMs like Anthropic or GPT-o1 to create a training dataset,
- fine-tuning our selected pre-trained model, like Llama, to achieve optimal results.

Key Responsibilities:
- Conduct automated and manual checks to verify model responses.
- Prepare automated data sets from raw data according to specified requirements.
- Create automation scripts for statistical analysis to handle typical requests, such as:
  Identifying the most and least popular items from a data set.
  Sorting or filtering data sets based on predefined criteria.
- Develop automated tests for large data sets, ensuring success based on simple criteria, such as predefined numeric or textual values, or ranges of values.
Candidate Selection Criteria:
- A portfolio demonstrating experience with similar tasks.
- Code examples of automation that can be reviewed.
- Platform: Linux. Language preferences include Python, Bash, Java, and JavaScript. Lesser interest in Ruby, Rust, Go, and other rising technologies.
Respond to this job offer with list of your skills and experience that matching required to handle such tasks.
IOIX Ukraine

Senior LLM engineer - custom LLMs creation and fine-tuning

Part-time · Full Remote · Worldwide · Product · 3 years of experience · Pre-Intermediate
Custom LLM Models Creation and Fine-Tuning (in japanese and english social content) The project aims to develop a custom machine learning model to 1) accurately detect country names and 2) determine if texts pertain to Japanese national elections ......
Custom LLM Models Creation and Fine-Tuning
(in japanese and english social content)

The project aims to develop a custom machine learning model to
1) accurately detect country names and
2) determine if texts pertain to Japanese national elections
... improving upon common issues found in standard language models.

To respond to this offer, list the competencies needed for this task and confirm they align with your skills and experience.

Introduction
Custom model tuning aims to enhance accuracy in entity detection and handling special cases. Standard models like Llama, Mistral, Llama3, QWQ, and Gemma often face several issues:
1. Inaccurate or vague responses unsuitable for extracting entity names or feature values.
2. Inconsistent response formats that complicate parsing.
3. Incorrect outputs in feature detection and attribute tasks.
4. Semantic errors in entity detection and evaluation tasks.

The custom model tuning attempt addresses problem #3. This is a relatively simple task for LLM, typically solved quickly (less than 2 seconds per request) when filtering arrays of 10-100 thousand elements.

Task Examples:
1. Identify countries referenced in the text, either directly or indirectly.
2. Assess whether the text pertains to national elections in Japan.
Both tasks analyze Japanese or mixed Japanese-English texts up to 500 characters from Twitter, encompassing official news, personal opinions, and dialogues.

INPORTANT NOTE: Identifying a country's name in the text is straightforward and can be effectively handled with regular expressions, as it's an NLP task rather than an LLM's. However, the challenge lies in detecting indirect references to a country, as specified in the task definition. For instance, mentions like the "White House" or a USA political party point to the USA, while the name of a Japanese political party indicates Japan. This complexity is the main challenge of the task.

An explicit mention of the name of a country, language, or nationality, without using it as an object or subject that is the main part of the sentence or semantic agent, is also preferable not to include in the list.

Country List Detection with LLM model:
Common errors:
- Including geographical locations (regions, prefectures)
- Including continents and organization names
- Missing indirect references
- Incorrect detection based on political parties, positions, politician names
Result: delimited list of country names in Japanese (0 to 4 elements)

Election Topic Detection with LLM model:
Task: determine if Japanese national elections are the main text topic

Positive cases include: voting processes, preparation, election campaigns, program discussions
Common errors:
- Wrong country identification
- Incorrect election type detection
- False positives on keyword mentions
Result: clear boolean format (YES/NO)

Platforms and Tools

The main analysis tool is Ollama, utilized as both a CLI and REST HTTP service for experimental research and routine processing. Ideally, the custom model should be manageable via Ollama, though this is not mandatory. If Ollama is unavailable, the model must offer a REST HTTP service for local deployment on a dedicated server.

Datasets of 10 000 items will be provided, along with small sets (10-100) of typical error cases for selective testing.
That is, we need to split the task into two stages or steps:
1. Preparation of quality datasets for training
2. Training, tuning, or creating a model from scratch.
_______

Canditate requirements

The candidate should have strong experience with these specific tools rather than a broader but shallower knowledge of many frameworks. This focused approach aligns better with the project's specific goals of building a fast, accurate multilingual classification system.

Machine Learning & NLP Expertise:
- Strong background in Natural Language Processing (NLP) and text classification
- Experience with multilingual text processing (specifically Japanese and English)
- Proficiency in developing and fine-tuning machine learning models
- Knowledge of modern language models and their applications
Programming & Tools:
- Proficiency in Python (or C++, Ruby, JavaScript, or another language platform depending on a framework and model basics) and relevant ML/NLP libraries
- Experience with text processing and classification frameworks
- Familiarity with large language models (LLMs)
Data Processing Skills:
- Experience in handling multilingual datasets
- Ability to work with various data formats and sources
- Knowledge of data cleaning and preprocessing techniques
- Experience with social media data processing (particularly Twitter/X data)
Performance Optimization:
- Ability to optimize models for speed (requirement of 2-second response time)
- Experience in handling large-scale data processing (10,000-100,000 items)
- Skills in model optimization and efficiency improvement
Task-Specific Experience:
- Text classification and entity recognition (specifically for country detection)
- Context-based classification (such as election-related content detection)
- Experience with short-text classification (500 characters or less)
Language Requirements:
- Proficiency in Japanese language processing
- Experience with mixed language content (Japanese-English)
- Understanding of multilingual NLP challenges
Education and Experience:
- Master's or Ph.D. in AI LLMs
- Minimum 3-5 years of experience in LLMs ML/NLP development
- Demonstrated experience with similar text classification projects, examples of fine tuned or created from scratch specialized models
Additional Desired Qualifications:
- Experience with Japanese language NLP tools and frameworks
- Knowledge of social media content analysis
- Background in building production-ready ML systems
- Understanding of ML model deployment and scaling
1. Experience with Major LLM Platforms and Their Tools:
• Meta's LLaMA ecosystem (especially LLaMA 2)
• Experience with Ollama deployment and management
• Knowledge of other major LLMs: OpenAI API, Anthropic Claude, Cohere
• Understanding of open-source LLM deployment and fine-tuning

2. Fine-tuning and Adaptation Skills:
• Experience in adapting pre-trained models for specific tasks
• Knowledge of efficient fine-tuning techniques (LoRA, QLoRA, PEFT)
• Understanding of prompt engineering and few-shot learning
• Experience with model quantization and optimization

3. Practical Skills:
• Ability to evaluate and choose appropriate base models
• Experience in model deployment and serving
• Knowledge of cost-effective approaches to model adaptation
• Understanding of inference optimization techniques

4. Task-Specific Requirements:
• Experience with multilingual models (Japanese-English specifically)
• Knowledge of entity recognition fine-tuning
• Understanding of context classification
• Experience with short text processing optimization

The task of extracting country names from text called Named Entity Recognition or NER, and their extraction from text, if explicitly present, is Named Entity Extraction (NEE). And Named Entity Identification (NEI) if entities are not explicitly mentioned and need to be formulated from context.

Manual verification of each result is very important because otherwise, even a small percentage of incorrect data can significantly damage the trained model and reduce the statistical quality of its future performance.

1st task is preparing quality samples for model training. This will focus more on the end result, namely obtaining quality datasets with verified results for both tasks: the list of countries and the topic about elections in Japan.

We expect quality datasets of various sizes suitable for LLM training as output.

We can stipulate that the first small dataset (0.5-2K) should be provided within a week, and larger ones (>5K) later...
Sample dataset will be provided in JSON format, filtered by topic, and likely to contain target entities:

- 55K element archive
- 1.5K element dataset with results from different models for the country list task might be useful for comparison examples - as we already have.

The task should be into two stages or steps:
1. Preparation of quality datasets for training
2. Training, tuning, or creating a model from scratch.

IOIX Ukraine

Web Crawling/scraping Engineer (5+), Google, Twitter platforms to $5000

LLM QA Engineer / Analyst to $2500

Senior LLM engineer - custom LLMs creation and fine-tuning