Google has today announced a new feature that will automatically enhance the audio output in YouTube Stories by isolating the speech of a video’s subject. The feature is called “Enhance speech” and it will directly be accessible in the volume controls section of the editing tools. Once enabled, it automatically reduces the background noise so that voice of the subject is more clear in YouTube Stories. Also, enabling it will play back the improved speech output in a loop so that creators can compare how their video turns out with the effect applied. Also, it can be turned on and off until you get the desired output.
The new “Enhance Speech” feature for recording YouTube Stories builds upon the ML-based “Looking to Listen” technology that the company introduced two years ago. It relies on audio-visual cues such as mouth movements and facial expressions to separate the subject’s voice. In addition to isolating the subject’s voice, the technology also plays a role in suppressing the noise. Google says it has also tested the technology extensively to make sure that “it performs consistently across different recording conditions and for people with different appearances and voices.”
The company further adds that the machine learning model behind this feature has been trained to handle a wide range of voices, languages and accents, as well as a multitude of visual appearances. To better understand the audio-visual cues, the company also took into account attributes such as the speaker’s age, skin tone, spoken language, voice pitch, visibility of the speaker’s face, head pose throughout the video, facial hair, and presence of glasses to name a few. In addition to “Enhance Speech”, the company is also exploring new ways to implement the technology in more products.
Listed below are a few videos that demonstrate how the new speech enhancement feature comes to life: