AI engineer (Real-Time Audio Commentary) Offline
About the Project
We are building a cutting-edge, real-time AI sports commentator. This system analyzes live video streams of games to generate and deliver commentary. While our MLOps team focuses on the low-latency video analysis, your role is to ensure the final output sounds completely human.
Role Summary
We are seeking a specialized AI Engineer with deep expertise in speech synthesis, audio data pipelines, and real-time audio generation. Your core mission is to solve the "last mile" of our project: making the AI-generated commentary "very naturally," "very high quality," and feel "like a real person as opposed to AI".
This is a critical role requiring specific experience in making synthetic speech sound human, emotional, and context-aware. You will be responsible for everything from dataset creation to the final synthesis models that generate dynamic, natural-sounding commentary in real-time.
Key Responsibilities
- Audio Dataset Architecture: Design, build, and manage the large-scale audio dataset that powers our commentator. This includes sourcing human-recorded clips, defining data structures, and developing annotation strategies.
- Contextual Metadata: Implement a robust metadata system for audio clips, tagging them by excitement level, match event type (e.g., goals, fouls), player/team tags, and other contextual data.
- Speech Synthesis & Cloning: Research, evaluate, and implement state-of-the-art speech synthesis (TTS) or voice cloning models (e.g., using paid tools or in-house models) to achieve human-like intonation and flow.
- Dynamic Audio Generation: Architect the system for handling dynamic content, especially player and team names.
- Natural "Stitching": Solve the critical challenge of stitching audio segments together (e.g., "What a goal by" + "[player_name]!") while ensuring the transitions sound natural and maintain consistent tone and emotion.
- Emotional Modeling: Develop the system for controlling and triggering appropriate variations in excitement and tone based on in-game events. This includes mapping game context (like a last-minute goal) to a specific emotional intensity.
- Commentary System Architecture: Design the pipeline that maps game data and events (passes, goals, momentum shifts) to the selection or generation of contextually appropriate commentary.
- Trigger Logic: Implement the logic (rule-based, ML, or hybrid) that maps game context to the correct audio playback.
- Synchronization & Timing: Work with the MLOps team to ensure all commentary is tightly synchronized and aligns naturally with the on-screen events.
- Audio Post-Processing: Investigate and apply audio mixing or post-processing techniques to dynamically match the in-game context, such as blending with crowd noise.
Required Qualifications & Experience
- 3+ years of demonstrated experience in engineering, including speech synthesis (TTS), or a related AI/ML field.
- Proven experience in building and managing large-scale audio datasets for machine learning.
- Deep expertise in modern speech synthesis (TTS), voice cloning, or phoneme-based generation techniques.
- Hands-on experience solving audio "stitching" or "splicing" challenges, with a strong portfolio of achieving natural intonation and prosody.
- Experience with modeling emotion, excitement, or intonation in speech.
- Strong understanding of audio system architecture, including how to map game data/events to audio triggers.
- Familiarity with audio processing, mixing, or post-production techniques.
- Strong programming skills (e.g., Python) and experience with relevant audio and machine learning libraries.
What We Offer:
- Medical Insurance in Ukraine and Multisport program in Poland;
- Offices in Ukraine and Poland (Wroclaw);
- All official holidays;
- Paid vacation and sick leaves;
- Tax & accounting services for Ukrainian contractors;
- The company is ready to provide all the necessary equipment;
- English classes up to three times a week;
- Mentoring and Educational Programs;
- Regular Activities on a Corporate level (Incredible parties, Team Buildings, Sports Events, and Tech Events);
- Advanced Bonus System.
Required skills experience
Required domain experience
| Machine Learning / Big Data | 2 years |
Required languages
| English | B2 - Upper Intermediate |
The job ad is no longer active
Look at the current jobs ML / AI →