YouTube’s Videos Can’t Be Used to Train OpenAI’s Sora

YouTube CEO Neal Mohan is addressing concerns regarding using YouTube content to train artificial intelligence models.

Mohan highlighted during an interview with Bloomberg the lack of concrete evidence on whether OpenAI has used YouTube videos to enhance its video generation AI software, Sora. However, he stressed that any such use without permission would contravene YouTube’s terms of service.

“From a creator’s perspective, when a creator uploads their hard work to our platform, they have certain expectations,” Mohan said, per the report. “One of those expectations is that the terms of service is going to be abided by. It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform.”

Setting Expectations

Mohan elaborated on the expectations set by content creators when they upload their work to YouTube, emphasizing that the platform’s terms prohibit unauthorized downloads or use of video and transcript data. This policy underpins the trust and integrity between YouTube and its users, ensuring creators’ contributions are protected under the platform’s guidelines.

The issue arises amid a debate and legal battle over the data sources employed by AI companies to train their models. OpenAI, a player in the field backed by Microsoft, has been at the center of discussions for its use of diverse web content in developing technologies like ChatGPT and DALL-E in addition to Sora.

The pursuit of advanced AI capabilities has led companies to seek vast amounts of data, raising questions about the ethical use of internet-sourced content. OpenAI has been sued by The New York Times, other publications and authors over its use of content to train its models.

Lack of Clarity

Despite inquiries, OpenAI has not clarified the specific use of YouTube videos for Sora’s development, with Chief Technology Officer Mira Murati expressing uncertainty in previous statements. This ambiguity extends to discussions around the training of OpenAI’s forthcoming model, GPT-5, with reports suggesting consideration of public YouTube video transcriptions as potential data sources.

In the interview with Bloomberg, Mohan discussed Google’s approach to using YouTube content for its AI model, Gemini. He indicated a cautious procedure that respects individual agreements with creators. While some YouTube content may contribute to AI training, such use is aligned with the terms of service and any existing contracts with content owners.

Meanwhile, OpenAI said it sees potential for companies to use ChatGPT, according to a separate Bloomberg report. OpenAI is experiencing an increase in demand for its corporate version of ChatGPT despite facing competition from a growing number of AI companies offering similar products for the workplace. OpenAI Chief Operating Officer Brad Lightcap said ChatGPT Enterprise now has over 600,000 users, growing from approximately 150,000 users in January.

For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.

Reference

Eugen Boglaru

Eugen Boglaru is an AI aficionado covering the fascinating and rapidly advancing field of Artificial Intelligence. From machine learning breakthroughs to ethical considerations, Eugen provides readers with a deep dive into the world of AI, demystifying complex concepts and exploring the transformative impact of intelligent technologies.

Setting Expectations

Lack of Clarity

Leave a Comment Cancel reply