Gen AI speech and video platforms grapple with a flood of user-generated content

AI-generated audio and video platforms are rapidly improving, proving themselves potentially useful in areas from entertainment to HR. But its leaders acknowledge that there are risks—and they’ll have to be careful about enforcing moderation policies that prevent users from using AI to impersonate public figures or commit fraud.

Speaking with Fortune senior writer Jeremy Kahn at Fortune’s Brainstorm AI conference in London, Synthesia CEO Victor Riparbelli and ElevenLabs CEO Mati Staniszewski said that they’re still figuring out how to ensure that the voice cloning and video generation tech their companies provide is used for good.

“As with most new tech, we immediately go to what things can go wrong. In this case, that’s right—this will be used by bad people to do bad things, that’s for sure,” Riparbelli, co-founder of the video-generation platform, said.

Using AI-generated audio or video to impersonate public figures has emerged as a controversial topic. The past few months have seen the creation of explicit deepfakes of Taylor Swift and fake Joe Biden robocalls, leading observers to worry about how AI-generated content will influence this fall’s U.S. presidential election.

OpenAI recently announced it would delay the rollout of its AI voice cloning tool due to misuse risks, highlighting the possible political implications: “We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” it wrote in a statement.

Staniszewski, co-founder of AI voice cloning company ElevenLabs, told Fortune’s Kahn that his company invests in know-your-customer protocols and mandatory disclosures to ensure that all content generated on the platform is linked back to a specific user’s profile. The company is also exploring different ways to make it clear to users what content is AI-generated and what’s not, he said.

“All the content that is generated by ElevenLabs can be traced back to a specific account,” Staniszewski said. “One thing we are advocating for is going beyond watermarking what’s AI and watermarking what’s real content.”

Riparbelli said that Synthesia has protocols in place requiring users to verify their identity and give their consent before creating AI-generated videos.

“It is impossible today to go in and take a YouTube video and make clones of someone [on Synthesia]. We take control that way,” Riparbelli said. “We have pretty heavy content moderation, rules about what you can create and what you cannot create.”

A reporter asked about the potential risks of audio deepfakes in reference to London mayor Sadiq Khan, who was the target of a viral audio clip impersonating him criticizing pro-Palestine marches last November.

“Parliament needs to wake up and understand that if they don’t take action, it’ll provide opportunities for mischief makers to be bolder,” Khan told the BBC.

“All the content out there should be known as AI-generated, and there should be tools that allow you to quickly get that information as a user…so Sadiq Khan can send out a message and we can verify that this is a real message,” Staniszewski said.

Riparbelli said that it would likely take time for the industry and lawmakers to come to a consensus regarding how best to use and regulate the tools like the one his and Staniszewski’s companies are offering.

“As with any new technology, you will have these years where people are figuring out what’s right and what’s wrong,” Riparbelli said.

Reference

Eugen Boglaru

Eugen Boglaru is an AI aficionado covering the fascinating and rapidly advancing field of Artificial Intelligence. From machine learning breakthroughs to ethical considerations, Eugen provides readers with a deep dive into the world of AI, demystifying complex concepts and exploring the transformative impact of intelligent technologies.

Leave a Comment Cancel reply