Home Internet AI companies have consumed the entire internet to train their models and are now running out of data – Firstpost

AI companies have consumed the entire internet to train their models and are now running out of data – Firstpost

AI companies may soon be forced to train their upcoming AI models on AI-generated data. Image Credit: Reuters

AI companies are facing a monumental challenge, one that would render all the billions of dollars that Big Tech is investing in them, pointless: they are running out of internet.

In the race to develop ever-larger and more advanced large language models, AI companies have practically consumed all of the open internet, and are now facing the imminent end of data, as reported by the Wall Street Journal.

This issue is pushing some firms to seek alternative sources for training data, such as publicly available video transcripts and the creation of AI-generated “synthetic data”. However, using AI-generated data to train AI models is a problem in and of itself — it leads to a higher chance of AI models hallucinating.

Furthermore, discussions around synthetic data, have raised some serious concerns regarding the potential consequences of training AI models on AI-generated data. Experts believe that relying too much on AI-generated data leads to digital “inbreeding” which could eventually result in the AI model collapsing on itself.

While entities like Dataology, founded by former Meta and Google DeepMind researcher Ari Morcos, are exploring methods to train expansive models with fewer data and resources, most major players are playing with some rather unconventional and contentious approaches to data training.

OpenAI, for example, is considering training its GPT-5 model using transcriptions from publicly available YouTube videos according to sources cited by the WSJ, even though the AI company is facing criticism for using such videos to train Sora, and may face lawsuits by video creators.

Nevertheless, companies like OpenAI and Anthropic, are planning to address this by developing superior synthetic data, although specifics regarding their methodologies remain still unclear.

Fears of AI companies have been running around for quite some time now. Despite predictions by some, like Epoch researcher Pablo Villalobos, estimating that AI could exhaust its usable training data in the coming years, there is a prevailing sentiment that significant breakthroughs could mitigate these concerns.

However, an alternative solution to this dilemma exists: AI companies could opt to refrain from pursuing larger and more advanced models, considering the environmental toll associated with their development, including significant energy consumption and the reliance on rare-earth minerals for computing chips.

(With inputs from agencies)

 

Reference

Denial of responsibility! TechCodex is an automatic aggregator of Global media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, and all materials to their authors. For any complaint, please reach us at – [email protected]. We will take necessary action within 24 hours.
DMCA compliant image

Leave a Comment