AI engineer (Real-Time Audio Commentary) Offline

About the Project

We are building a cutting-edge, real-time AI sports commentator. This system analyzes live video streams of games to generate and deliver commentary. While our MLOps team focuses on the low-latency video analysis, your role is to ensure the final output sounds completely human.

 

Role Summary

We are seeking a specialized AI Engineer with deep expertise in speech synthesis, audio data pipelines, and real-time audio generation. Your core mission is to solve the "last mile" of our project: making the AI-generated commentary "very naturally," "very high quality," and feel "like a real person as opposed to AI".

This is a critical role requiring specific experience in making synthetic speech sound human, emotional, and context-aware. You will be responsible for everything from dataset creation to the final synthesis models that generate dynamic, natural-sounding commentary in real-time.

 

Key Responsibilities

  • Audio Dataset Architecture: Design, build, and manage the large-scale audio dataset that powers our commentator. This includes sourcing human-recorded clips, defining data structures, and developing annotation strategies.
  • Contextual Metadata: Implement a robust metadata system for audio clips, tagging them by excitement level, match event type (e.g., goals, fouls), player/team tags, and other contextual data.
  • Speech Synthesis & Cloning: Research, evaluate, and implement state-of-the-art speech synthesis (TTS) or voice cloning models (e.g., using paid tools or in-house models) to achieve human-like intonation and flow.
  • Dynamic Audio Generation: Architect the system for handling dynamic content, especially player and team names.
  • Natural "Stitching": Solve the critical challenge of stitching audio segments together (e.g., "What a goal by" + "[player_name]!") while ensuring the transitions sound natural and maintain consistent tone and emotion.
  • Emotional Modeling: Develop the system for controlling and triggering appropriate variations in excitement and tone based on in-game events. This includes mapping game context (like a last-minute goal) to a specific emotional intensity.
  • Commentary System Architecture: Design the pipeline that maps game data and events (passes, goals, momentum shifts) to the selection or generation of contextually appropriate commentary.
  • Trigger Logic: Implement the logic (rule-based, ML, or hybrid) that maps game context to the correct audio playback.
  • Synchronization & Timing: Work with the MLOps team to ensure all commentary is tightly synchronized and aligns naturally with the on-screen events.
  • Audio Post-Processing: Investigate and apply audio mixing or post-processing techniques to dynamically match the in-game context, such as blending with crowd noise.

 

Required Qualifications & Experience

  • 3+ years of demonstrated experience in engineering, including speech synthesis (TTS), or a related AI/ML field.
  • Proven experience in building and managing large-scale audio datasets for machine learning.
  • Deep expertise in modern speech synthesis (TTS), voice cloning, or phoneme-based generation techniques.
  • Hands-on experience solving audio "stitching" or "splicing" challenges, with a strong portfolio of achieving natural intonation and prosody.
  • Experience with modeling emotion, excitement, or intonation in speech.
  • Strong understanding of audio system architecture, including how to map game data/events to audio triggers.
  • Familiarity with audio processing, mixing, or post-production techniques.
  • Strong programming skills (e.g., Python) and experience with relevant audio and machine learning libraries.

 

What We Offer:

  • Medical Insurance in Ukraine and Multisport program in Poland;
  • Offices in Ukraine and Poland (Wroclaw);
  • All official holidays;
  • Paid vacation and sick leaves;
  • Tax & accounting services for Ukrainian contractors;
  • The company is ready to provide all the necessary equipment;
  • English classes up to three times a week;
  • Mentoring and Educational Programs;
  • Regular Activities on a Corporate level (Incredible parties, Team Buildings, Sports Events, and Tech Events);
  • Advanced Bonus System.

 

Required skills experience

Required domain experience

Machine Learning / Big Data 2 years

Required languages

English B2 - Upper Intermediate
Python, Machine Learning, LLM, Generative AI

The job ad is no longer active

Look at the current jobs ML / AI →

Loading...