by Jeremy Siegel
“Um, could you please mute your microphone?”
Among the countless ways our lives have shifted since the pandemic, video conferencing has become the new normal – changing the way we work, the way we go to school, the way we see our doctors, even the way we watch guests on talk shows or attend live concerts. With this sudden mainstreaming of technology came the need for all of us to adopt certain skills – what to wear (at least from the waist up), where to position our camera for the best background and lighting, and how to magically change our backgrounds to look like we’re at the Grand Canyon or on the Death Star. But those hacks affected our visual experience – getting the best audio experience was a much bigger challenge.
It is also now common for our kids, our pets, the clattering ceiling fan, a passing ambulance, or a flushing toilet to join us in the boardroom during important meetings. “Um, could everyone please mute your microphones,” was one of the most uttered phrases of the year; detecting that rogue unmuted laptop often stole focus from what we were dialing in to discuss or learn. Even though online productivity during home lockdown was significant, countless minutes were lost as listeners repeatedly asked speakers to repeat what they said due to noise interference from a vast universe of potential audio distractions.
Perhaps by a divine stroke of luck, a company called BabbleLabs was founded in 2017 and acquired by Cisco in 2020 to address this problem. By the time the pandemic struck, their artificial intelligence (AI) technology team had already made significant strides to dramatically reduce noise interference and improve the quality of speech communications. While hundreds of deep learning startups in Silicon Valley were focused on visual intelligence, including facial recognition, tagging, and picture quality, their technology focused on audio. Their mission was relatively straightforward – teach AI what is and is not a desired sound during a teleconference.
"The Pro Sound Effects library gives us good coverage. The high quality of each sound file allows our software to identify, differentiate, and reduce or eliminate a seemingly endless variety of audio circumstances – so that everyone on the receiving end can clearly understand what was said."
So, how do you teach an algorithm not only the difference between a barking dog or car horn and a human voice? Deeper still, how do you get it to differentiate between a speaking adult and a screaming child, crying baby, or murmuring conference attendee?
Enter Pro Sound Effects. While the company provides a wide variety of library access packages to its core users, including sound designers for movies, TV shows and video games, the complete library of high quality, professionally produced audio files exceeds 850,000 unique sounds in over 600 categories. Cisco licensed PSE’s entire 5.3 TB library to start teaching it to their AI, one broken drinking glass and humming fan blade at a time.
Cisco’s mission is to remove 100 percent of unwanted noise from video conferencing applications. And the better their algorithm understands the infinite sounds of the world, the better it can recognize and filter out unwanted sounds, letting human speech shine through.
"The Pro Sound Effects library gives us good coverage," said Cisco's Technical Lead, Cyprian Wronka. "The high quality of each sound file allows our software to identify, differentiate, and reduce or eliminate a seemingly endless variety of audio circumstances – so that everyone on the receiving end can clearly understand what was said."
With this bleeding edge speech technology in-hand, Webex user can expect to hear meetings to continue to sound better and better.
Pro Sound Effects powers the audio data intelligence for leading Machine Learning and Audio Research teams.