Skip to content

Apple, Nvidia, and Anthropic used thousands of swiped YouTube videos to train AI

    In response to the lawsuits, defendants including Meta, OpenAI, and Bloomberg have argued that their actions constitute fair use. A case against EleutherAI, which originally scraped and made the books public, was voluntarily dismissed by the plaintiffs.

    The remaining cases are still in the early stages of litigation, leaving questions of consent and payment unresolved. The Pile has since been removed from the official download site, but is still available on file-sharing services.

    “Technology companies have been hit hard,” said Amy Keller, a consumer protection attorney and partner at the firm DiCello Levitt, which has filed lawsuits on behalf of creators whose work has allegedly been taken by AI companies without their consent.

    “People are concerned that they didn't have a choice in the matter,” Keller said. “I think that's the real issue.”

    Imitating a parrot

    Many creators are unsure about the path they see ahead.

    Full-time YouTubers police their work for unauthorized use and regularly file takedown requests. Some fear it’s only a matter of time before AI can generate content similar to what they create, or even outright copies.

    Pakman, the creator of The David Pakman Showrecently saw the power of AI while scrolling through TikTok. He came across a video labeled as a Tucker Carlson clip, but when Pakman watched it, he was shocked. It sounded like Carlson, but was, word for word, what Pakman had said on his YouTube show, down to the cadence. He was equally shocked that only one of the video’s commenters seemed to recognize that it was fake: a voice clone of Carlson reading Pakman’s script.

    “This is going to be a problem,” Pakman said in a YouTube video he made about the fake. “You can do this to basically anyone.”

    EleutherAI co-founder Sid Black wrote on GitHub that he created YouTube Subtitles using a script that downloads subtitles from YouTube's API in the same way a YouTube viewer's browser downloads them when watching a video. According to documentation on GitHub, Black used 495 search terms to select videos, including “funny vloggers,” “Einstein,” “black protestant,” “Protective Social Services,” “infowars,” “quantum chromodynamics,” “Ben Shapiro,” “Uighurs,” “fruitarian,” “cake recipe,” “Nazca lines,” and “flat earth.”

    Although YouTube's terms of service prohibit accessing videos through “automated means,” more than 2,000 GitHub users have favorited or endorsed the code.

    “There are many ways YouTube could prevent this module from working if that’s what they want,” machine learning engineer Jonas Depoix wrote in a discussion on GitHub, where he posted the code Black used to access YouTube subtitles. “That hasn’t happened yet.”

    In an email to Proof News, Depoix said he hadn't used the code since he wrote it for a project as a university student several years ago and was surprised that people found it useful. He declined to answer questions about YouTube's policies.

    Google spokesman Jack Malon said in an emailed response to a request for comment that the company “has taken steps over the years to prevent abuse and unauthorized scraping.” He did not respond to questions about the use of the material as training data by other companies.

    Among the videos used by AI companies are 146 of Einstein Parrota channel with nearly 150,000 subscribers. The African grey’s caretaker, Marcia, who did not want to use her last name for fear of endangering the famous bird’s safety, said she initially found it amusing to hear that AI models had taken words from a mimicking parrot.

    “Who would want to use a parrot's voice?” Marcia said. “But I know he speaks very well. He speaks with my voice. So he parrots me, and then AI parrots the parrot.”

    Once assimilated by AI, data cannot be unlearned. Marcia was concerned about all the unknown ways her bird’s information could be used, including creating a digital duplicate parrot and, she worried, making it swear.

    “We are entering uncharted territory,” said Marcia.