قالب وردپرس درنا توس
Home / Technology / Google Duo uses a new codec for better call quality over bad connections

Google Duo uses a new codec for better call quality over bad connections



While US operators are busy marketing their new 5G networks, the reality is that the vast majority do not experience the advertised speeds. There are still many parts of the United States – and around the world – where data rates are low, so to compensate, services like Google Duo use compression techniques to deliver the best possible video and audio experience. Google is now testing a new audio codec that aims to significantly improve the sound quality of poor network connections.

In a blog post, the Google AI team describes their new high-quality, very low-bit-rate voice codec they have called “Lyra.” Like traditional parametric codecs, Lyra’s basic architecture involves extracting distinctive speech properties (also known as “functions”

😉 in the form of log-mel spectra that are then compressed, transmitted over the network, and recreated at the other end using a generative model. . Unlike more traditional parametric codecs, however, Lyra uses a new, high-quality sound-generating model that is not only capable of extracting critical parameters from speech, but is also capable of reconstructing speech using minimal amounts of data. The new generative model used in Lyra builds on Google’s previous work with WaveNetEQ, the generative model-based packet loss hiding system currently used in Google Duo.

Lyra architecture

Lyra’s basic architecture. Source: Google

Google says that the approach has made Lyra on a par with the latest waveform codecs used in many streaming and communication platforms today. The advantage of Lyra over these state-of-the-art waveform codecs, according to Google, is that Lyra does not send the signal sample-by-sample, which requires higher bit rates (and thus more data). To overcome the computational complexity issues by running a generative model on the device, Google says that Lyra uses a “cheaper recurring generative model” that works “at a lower speed”, but generates multiple signals in different frequency ranges in parallel which are later combined “into a single output signal with the desired sampling rate. “Running this generative model on a medium-sized device in real time results in a processing delay of 90 ms, which Google says is” in line with other traditional voice codecs. “

Paired with the AV1 codec for video, Google says that video chatting can take place even for users on an old 56 kbps modem. This is because Lyra is designed to operate in heavily bandwidth-constrained environments such as 3 kbps. According to Google, Lyra easily surpasses the royalty-free Opus codec with open source code, as well as other codecs such as Speex, MELP and AMR with very low bit rates. Here are some speech examples provided by Google. With the exception of sound encoded in Lyra, each of the speech examples suffers from degraded sound quality at very low bit rates.

Pure speech

Original

[email protected]

[email protected]

[email protected]

Noisy environment

Original

[email protected]

[email protected]

[email protected]

Google says it trained Lyra “with thousands of hours of audio with speakers in over 70 languages ​​using open source audio libraries and then verified the sound quality with experts and crowds of listeners.” As such, the new codec is already rolling out in Google Duo to improve call quality on very low bandwidth connections. While Lyra is currently targeting voice use cases, Google is exploring how to turn it into a general-purpose audio codec.


Source link