ML Data Engineer
Vidar Systems is an acoustic technology startup at the cutting edge of defense and security solutions. We employ retail hardware components to build military-grade systems that redefine acoustic weapon locating in terms of affordability, efficiency, and reliability. Our team is globally distributed, with primary activities and production based in the heart of Ukraine – Kyiv.
About the Role:
Become a core member of our lean, high-impact engineering team and help shape Ukraine’s defensive edge through data. In this role, you’ll be responsible for managing and transforming large volumes of acoustic and telemetry data to fuel machine learning models deployed in mission-critical systems. You will own the data workflows that directly support our next-generation AI-based battlefield sensing technologies.
What You’ll Do:
- Build and maintain robust pipelines for ingesting, transforming, and storing structured and unstructured acoustic data.
- Develop scalable solutions for data labeling, augmentation, validation, and monitoring.
- Collaborate closely with ML researchers to optimize dataset quality, feature representation, and data curation for deep learning.
- Manage datasets across different environments, ensuring consistency, versioning, and reproducibility.
- Create tooling and dashboards to visualize data quality and pipeline performance.
- Identify patterns and extract insights from data to enhance existing models or develop innovative features.
What You Need:
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, Applied Math, or related field.
- 2+ years of experience in data engineering roles, ideally within ML/AI-driven teams.
- Strong Python skills, with experience in building and managing ETL/ELT pipelines.
- Hands-on with tools like Airflow, Prefect, Luigi or similar orchestration frameworks.
- Familiarity with ML frameworks such as TensorFlow, PyTorch, or scikit-learn.
- Experience with data versioning and lineage tools (e.g. DVC, MLflow, Weights & Biases).
- Proficiency in data processing libraries: Pandas, NumPy, Dask.
- Strong communication and collaboration skills; ability to work independently in high-stakes contexts.
- Technical English (documentation, code comments, communication in distributed team).
Nice to Have:
- Experience working with audio or time-series data.
- Exposure to C/C++ or embedded systems environments.
- Background in acoustic signal processing or related domains.
- Familiarity with DevOps tools (Docker, Kubernetes, CI/CD) and cloud infrastructure (AWS/GCP).
Our Recruitment Process:
- Intro Call with Recruiter – 45 min
- Technical Interview with ML Lead and CTO – 90 min
- Test Task – 4–6 hours
- Final Interview – 60 min
Why Join Us?
- High Autonomy: You’ll be trusted to lead key systems and make your mark on our data stack.
- Meaningful Work: Your pipelines will feed models that support real soldiers on the front line.
- Innovation at the Edge: Help us redefine how machine learning operates in harsh, real-world acoustic environments.
- Flexible Format: We support hybrid/remote work with a bias toward impact over hours.