Thu. Feb 6th, 2025

‘Universal Translator’ copies and lip syncs speakers – but Google warns against abuse<!-- wp:html --><div></div> <div> <p>Google is testing a powerful new translation service that redubs video in a new language while syncing the speaker’s lips with words they’ve never uttered. It can be very useful for many reasons, but the company has been outspoken about the possibility of abuse and the steps being taken to prevent it.</p> <p>“Universal Translator” was shown at Google I/O during a presentation by James Manyika, who heads up the company’s new “Technology and Society” division. It was offered as an example of something that has only recently been made possible by advances in AI, but at the same time carries serious risks that need to be considered from the outset.</p> <p>The “experimental” service takes an input video, in this case a lecture from an online course originally recorded in English, transcribes the speech, translates it, regenerates the speech (matching style and tone) into that language, and then edits the video so that the speaker’s lips better match the new audio.</p> <p>So it’s basically a deepfake generator, right? Yes, but the technology used for malicious purposes elsewhere has real utility. There are companies in the media world right now doing this sort of thing, laying out rules in post-production for dozens of reasons. (The demo was impressive, but it must be said that the technology still has a long way to go.)</p> <p>But those tools are professional tools made available in a strict media workflow, not a checkbox on a YouTube upload page. Nor is Universal Translator – so far – but if it ever will be, Google must consider the possibility that it will be used to create disinformation or other unforeseen dangers.</p> <p>Manyika called this a “tension between boldness and security”, and striking a balance can be difficult. But it’s clear that it can’t just be widely released for everyone to use without restrictions. Still, the benefits – for example making an online course available in 20 languages ​​without subtitles or re-recording – are undeniable.</p> <p>“This is a huge step forward for understanding learning, and we’re seeing promising results in course completion rates,” Manyika said. “But there is an inherent tension here: some of the same underlying technology can be misused by adversaries to create deepfakes. That’s why we built the service with guardrails to prevent abuse, and we only make it accessible to authorized partners. Soon we will integrate new innovations in watermarking into our latest generative models to also address the challenge of misinformation.”</p> <p>That’s certainly a start, but we’ve seen how those same bad actors are very capable when it comes to getting around such roadblocks. The “guard rails” are a bit wavy by hand, and sharing only with partners only works as long as the model doesn’t leak – as they often do. Of course, watermarking is also a good path to follow, but so far most approaches to that have been defeated by trivial operations such as cropping, resizing, and other minor manipulations to the watermarked media.</p> <p>Google demonstrated many AI capabilities today, both new and familiar, but whether and how they will be both useful and secure is still a mystery. But giving someone like Manyika (a researcher herself) stage time at their biggest event to say “wow this could be bad so we’re doing this and that, who knows if it will work” is at least a pretty honest way of saying approach the problem.</p> </div><!-- /wp:html -->

Google is testing a powerful new translation service that redubs video in a new language while syncing the speaker’s lips with words they’ve never uttered. It can be very useful for many reasons, but the company has been outspoken about the possibility of abuse and the steps being taken to prevent it.

“Universal Translator” was shown at Google I/O during a presentation by James Manyika, who heads up the company’s new “Technology and Society” division. It was offered as an example of something that has only recently been made possible by advances in AI, but at the same time carries serious risks that need to be considered from the outset.

The “experimental” service takes an input video, in this case a lecture from an online course originally recorded in English, transcribes the speech, translates it, regenerates the speech (matching style and tone) into that language, and then edits the video so that the speaker’s lips better match the new audio.

So it’s basically a deepfake generator, right? Yes, but the technology used for malicious purposes elsewhere has real utility. There are companies in the media world right now doing this sort of thing, laying out rules in post-production for dozens of reasons. (The demo was impressive, but it must be said that the technology still has a long way to go.)

But those tools are professional tools made available in a strict media workflow, not a checkbox on a YouTube upload page. Nor is Universal Translator – so far – but if it ever will be, Google must consider the possibility that it will be used to create disinformation or other unforeseen dangers.

Manyika called this a “tension between boldness and security”, and striking a balance can be difficult. But it’s clear that it can’t just be widely released for everyone to use without restrictions. Still, the benefits – for example making an online course available in 20 languages ​​without subtitles or re-recording – are undeniable.

“This is a huge step forward for understanding learning, and we’re seeing promising results in course completion rates,” Manyika said. “But there is an inherent tension here: some of the same underlying technology can be misused by adversaries to create deepfakes. That’s why we built the service with guardrails to prevent abuse, and we only make it accessible to authorized partners. Soon we will integrate new innovations in watermarking into our latest generative models to also address the challenge of misinformation.”

That’s certainly a start, but we’ve seen how those same bad actors are very capable when it comes to getting around such roadblocks. The “guard rails” are a bit wavy by hand, and sharing only with partners only works as long as the model doesn’t leak – as they often do. Of course, watermarking is also a good path to follow, but so far most approaches to that have been defeated by trivial operations such as cropping, resizing, and other minor manipulations to the watermarked media.

Google demonstrated many AI capabilities today, both new and familiar, but whether and how they will be both useful and secure is still a mystery. But giving someone like Manyika (a researcher herself) stage time at their biggest event to say “wow this could be bad so we’re doing this and that, who knows if it will work” is at least a pretty honest way of saying approach the problem.

By