Skip to content

YouTube Subtitles Add Explicit Language to Kids Videos

    “It’s surprising and disturbing,” said Ashique KhudaBukhsh, an assistant professor at the Rochester Institute of Technology who studied the issue with associates Krithika Ramesh and Sumeet Kumar at the Indian School of Business in Hyderabad.

    Automated captioning is not available on YouTube Kids, the version of the service aimed at children. But many families use the standard version of YouTube where they can be seen. Pew Research Center reported in 2020 that 80 percent of parents of children 11 years or younger said their child watched YouTube content; more than 50 percent of the children did this daily.

    KhudaBukhsh hopes the research will draw attention to a phenomenon he says has received little attention from tech companies and researchers and what he calls “inappropriate content hallucination” — when algorithms add inappropriate material that isn’t present in the original content. Think of it as the flip side of the common observation that autocomplete on smartphones often filters adult language to an annoying degree.

    YouTube spokesperson Jessica Gibby says that children under 13 are advised to use YouTube Kids, where automatic captions are not visible. On YouTube’s standard version, she says the feature improves accessibility. “We’re constantly working to improve automatic captioning and reduce errors,” she says. Alafair Hall, a spokesperson for Pocket.watch, a children’s animation studio that publishes Ryan’s World content, said in a statement that the company is “in close and immediate contact with our platform partners such as YouTube who are working to update incorrect video captions.” The operator of the Rob de Robot channel could not be reached for comment.

    Inappropriate hallucinations are not unique to YouTube or video captions. A WIRED reporter found that a transcript of a phone call processed by the startup Trint, rendered Negar a female name of Persian descent as a variant of the N-word, although it sounds distinctly different to the human ear. Trint CEO Jeffrey Kofman says the service has a coarse language filter that automatically redacts “a very small list of words.” The specific spelling that appeared in WIRED’s transcript wasn’t on that list, Kofman said, but it will be added.

    “The benefits of speech-to-text are undeniable, but there are blind spots in these systems that may require checks and balances,” says KhudaBukhsh.

    Those blind spots can seem surprising to people who understand speech in part by understanding the broader context and meaning of one’s words. Algorithms have improved their ability to process language, but still lack the capacity for a fuller understanding – something that has created problems for other companies that rely on machines to process text. A startup had to revamp its adventure game after it was discovered that it sometimes describes sexual scenarios involving minors.

    Machine learning algorithms “teach” a task by processing large amounts of training data, in this case audio files and associated transcripts. KhudaBukhsh says YouTube’s system sometimes inserts profanity because the training data mainly contains speech from adults and less from children. When the researchers manually checked examples of inappropriate words in captions, they often appeared with speech from children or people who did not appear to be native English. Previous studies have shown that transcription services from Google and other major tech companies make more errors for non-white speakers and fewer errors for standard American English, compared to regional American dialects.

    Rachael Tatman, a linguist who co-authored one of those earlier studies, says a simple block list of words not to be used on children’s YouTube videos would address many of the worst examples from the new research. “That apparently there isn’t one is a technical oversight,” she says.