Jobs Dnipro
2-
· 71 views · 4 applications · 23d
Data Extraction Specialist / Document Parsing Engineer
Ukraine · 3 years of experience · Upper-IntermediateAbout the company VReal Soft is a Ukrainian IT company with 7 years of experience building custom software solutions, mobile apps, and web platforms for clients in the US, Israel, and Europe. We are currently opening a position for a Data Extraction...About the company
VReal Soft is a Ukrainian IT company with 7 years of experience building custom software solutions, mobile apps, and web platforms for clients in the US, Israel, and Europe. We are currently opening a position for a Data Extraction Specialist, combining project work in the healthcare domain with the opportunity for long-term in-house collaboration.
Position Summary
We are looking for a skilled Data Extraction Engineer / Document Automation Developer to help build a system that automatically extracts structured information from various medical and immunization records and stores it in a structured database format.
You’ll be working with diverse document formats (PDFs, scans, electronic forms), using OCR and NLP-based tools to process them.Your responsibilities:
- Analyze medical document templates to identify key fields
- Develop pipelines for automatic extraction of fields such as:
- Student Full Name
- Date of Birth
- Sex/Gender
- Vaccination Information (e.g. Type, Date, Dose, Clinic, Manufacturer)
- Next Due Dates (calculated from context or intervals)
- Build and integrate document processing flows using OCR or NLP tools such as Tesseract, Azure Form Recognizer, Amazon Textract, etc.
- Validate and store extracted data in a structured database
- Ensure accuracy and integrity of parsed data
- Collaborate with development and QA teams to integrate the solution into the main platform
Requirements:
- Hands-on experience with OCR tools:
Tesseract, AWS Textract, Azure Form Recognizer, Google Document AI - Strong knowledge of .NET / C#, especially for parsing documents and data transformation
- Understanding of NLP techniques and Regular Expressions
- Experience transforming unstructured data into structured formats
- Solid understanding of databases and storage formats (e.g. JSON, SQL)
Nice to have:
- Experience in the healthcare or education domain
- Familiarity with ML/AI-based approaches to document processing
- English proficiency for reading documentation and written communication
What we offer:
- Opportunity to work on a meaningful healthcare-related product
- Potential for long-term in-house cooperation after the initial project
- Flexible working hours
- Support and mentorship from a technical team
- Comfortable office in Cherkasy, or remote work option
- Internal training and English language development
-
· 39 views · 2 applications · 8d
Data Extraction Specialist / Document Parsing Engineer
Ukraine · 3 years of experience · Upper-IntermediateAbout the company VReal Soft is a Ukrainian IT company with 7 years of experience building custom software solutions, mobile apps, and web platforms for clients in the US, Israel, and Europe. We are currently opening a position for a Data Extraction...About the company
VReal Soft is a Ukrainian IT company with 7 years of experience building custom software solutions, mobile apps, and web platforms for clients in the US, Israel, and Europe. We are currently opening a position for a Data Extraction Specialist, combining project work in the healthcare domain with the opportunity for long-term in-house collaboration.
Position Summary
We are looking for a skilled Data Extraction Engineer / Document Automation Developer to help build a system that automatically extracts structured information from various medical and immunization records and stores it in a structured database format.
You’ll be working with diverse document formats (PDFs, scans, electronic forms), using OCR and NLP-based tools to process them.Your responsibilities:
- Analyze medical document templates to identify key fields
- Develop pipelines for automatic extraction of fields such as:
- Student Full Name
- Date of Birth
- Sex/Gender
- Vaccination Information (e.g. Type, Date, Dose, Clinic, Manufacturer)
- Next Due Dates (calculated from context or intervals)
- Build and integrate document processing flows using OCR or NLP tools such as Tesseract, Azure Form Recognizer, Amazon Textract, etc.
- Validate and store extracted data in a structured database
- Ensure accuracy and integrity of parsed data
- Collaborate with development and QA teams to integrate the solution into the main platform
Requirements:
- Hands-on experience with OCR tools:
Tesseract, AWS Textract, Azure Form Recognizer, Google Document AI - Strong knowledge of .NET / C#, especially for parsing documents and data transformation
- Understanding of NLP techniques and Regular Expressions
- Experience transforming unstructured data into structured formats
- Solid understanding of databases and storage formats (e.g. JSON, SQL)
Nice to have:
- Experience in the healthcare or education domain
- Familiarity with ML/AI-based approaches to document processing
- English proficiency for reading documentation and written communication
What we offer:
- Opportunity to work on a meaningful healthcare-related product
- Potential for long-term in-house cooperation after the initial project
- Flexible working hours
- Support and mentorship from a technical team
- Comfortable office in Cherkasy, or remote work option
- Internal training and English language development