Recent Investigation Suggests Apple Trained Its AI Models From YouTube Videos Without Authorization, Includes MKBHD Videos

Earlier, OpenAI, Meta, and Google were criticized for transcribing YouTube videos to train their AI models, violating the copyrights of content creators. Now, a new report seems to have surfaced that highlights Apple following in the footsteps of other tech giants in training their LLM models through transcripts of the video content without the consent of the video creators, including some well-known tech reviewers.

Apple is in hot waters after using YouTube videos of content creators without their consent, infringing the creator’s copyrights

Lately, tech giants have been using YouTube videos to train AI models without creators’ consent, which has stirred many concerns. Now, Apple, along with other big companies, has found itself in the middle of the controversy for violating creators’ copyrights by using their content without their permission.

Wired reported that third parties downloaded the videos as subtitle files, which were then used to train LLM models. It claimed that over 170,000 videos were utilized, which includes content from well-known YouTubers, including MKBHD, Jimmy Kimmel, PewDiePie, and MrBeast, among many other content creators.

The report highlights that these big AI companies have been using the content for their training process despite this material extraction technique violating YouTube’s rules of independent applications of their videos and automated access without permission.

An investigation by Proof News found some of the wealthiest AI companies in the world have used material from thousands of YouTube videos to train AI. Companies did so despite YouTube’s rules against harvesting materials from the platform without permission. Our investigation found that subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by Silicon Valley heavyweights, including Anthropic, Nvidia, Apple, and Salesforce.

Although the act of transcribing the videos was not performed by Apple, a nonprofit agency called EleutherAI used it for educational purposes, to train developers, and to serve academic purposes; the company still ended up in controversy for using the dataset without consent.

The compilations are openly available for academics and developers, but tech giants have used them to train their high-profile models. Apple is said to be using the third-party data compilation, Pile, to train OpenELM, which was launched in April.

Such a situation raises questions regarding consent and ethical AI practices, the implications of which could be multifaceted if not dealt with precaution. We have yet to hear Apple’s take on the ongoing concerns.

Share this story

Facebook

Twitter