Skip to content

OpenAI has messed with the wrong mega-popular parenting forum

    Think of any topic vaguely related to raising children, and there’s probably a post about it on Mumsnet, the long-running, hugely popular, controversial parenting forum for mothers in the UK. Over its two-decade history, Mumsnet has amassed an archive of over six billion words written by its highly engaged user base, on topics ranging from dirty nappies to lazy husbands. (Not to mention an insane rant about dolphins.)

    This spring, after Mumsnet discovered that AI companies were scraping its data, the company decided to strike licensing deals with some of the industry’s major players, including OpenAI, which initially expressed willingness to explore a deal after Mumsnet first reached out. After talks with OpenAI failed, Mumsnet announced in July that it would be taking legal action.

    According to Mumsnet, during those initial conversations, a strategic partner of OpenAI told the company that datasets of more than 1 billion words would be of interest to the AI ​​giant. Mumsnet executives were enthusiastic. “We spent quite a bit of time with them in a back-and-forth conversation,” Justine Roberts, founder and CEO of Mumsnet, tells WIRED. “We had to sign a bunch of non-disclosure agreements, and they wanted a lot of information from us.”

    However, more than a month later, OpenAI told Mumsnet that the company was no longer interested in a partnership, according to an email exchange reviewed by WIRED. When asked why, the OpenAI employee characterized Mumsnet’s 6 billion-word dataset as too small to warrant a licensing deal, Roberts said. They also noted that OpenAI was primarily interested in large datasets that the public can’t yet access online, and that it wanted datasets that captured a broad range of human experiences.

    This sentiment was echoed by the company when WIRED asked for comment. “We seek partnerships on large-scale datasets that reflect human society, and do not seek partnerships solely on publicly available information,” said Kayla Wood, a spokesperson for OpenAI. “We support publisher and creator choice, provide ways for them to express their preferences about how their sites and content interact with AI in search results, and train generative AI foundational models.”

    Roberts says she was “annoyed” by the development. She recalls that OpenAI initially seemed interested in Mumsnet mainly because of the platform’s predominantly female-authored content. “It’s very high-quality conversation data,” she says. “It’s 90 percent female conversation, which is quite unusual.”

    OpenAI has signed a number of data licensing agreements with media outlets and platforms over the past year, including agreements with Vox Media, the Atlantic OceanAxel Springer, Time and WIRED parent company Condé Nast, as well as platforms full of user-generated content like Reddit. (Automattic, the owner of WordPress.com and Tumblr, was also reportedly in licensing talks earlier this year.) Because the details of those deals were not disclosed, it’s not clear how large their respective bodies are.

    When WIRED asked about the size of the datasets it would consider for commercial licensing, OpenAI declined to share that information. But spokesperson Kayla Wood stressed that the company’s partnerships with publishers “are focused on displaying their content in our products and driving traffic to them.”