Major Sites Say No To Apple's AI Scraping

In a separate analysis conducted this week, data journalist Ben Welsh found that just over a quarter of the news websites he examined (294 of the 1,167 mostly English-language, U.S.-based publications) block Applebot-Extended. By comparison, Welsh found that 53 percent of the news websites in his sample block OpenAI’s bot. Google introduced its own AI-specific bot, Google-Extended, last September; it’s blocked by nearly 43 percent of those sites, a sign that Applebot-Extended may still be under the radar. But as Welsh tells WIRED, the number has been “gradually increasing” since he started doing search.

Welsh has an ongoing project that monitors how news organizations approach large AI agents. “There’s been a bit of a split among news publishers about whether or not they want to block these bots,” he says. “I don’t have the answer to why every news organization has made their decision. Obviously, we can read that a lot of them do licensing deals where they get paid in exchange for allowing the bots — maybe that’s a factor.”

Last year, The New York Times reported that Apple was attempting to strike AI deals with publishers. Since then, competitors OpenAI and Perplexity have announced partnerships with various news outlets, social platforms and other popular websites. “A lot of the world’s largest publishers are clearly taking a strategic approach,” says Jon Gillham, founder of Originality AI. “I think in some cases there’s a business strategy involved, like withholding the data until there’s a partnership agreement.”

There is some evidence to support Gillham’s theory. Condé Nast websites, for example, blocked OpenAI’s web crawlers. After the company announced a partnership with OpenAI last week, it unblocked the company’s bots. (Condé Nast declined to comment on the record for this story.) Meanwhile, Buzzfeed spokesperson Juliana Clifton told WIRED that the company, which currently blocks Applebot-Extended, puts every AI web crawling bot it can identify on its blocklist unless the owner has entered into a partnership (usually paid) with the company, which also owns the Huffington Post.

With robots.txt having to be manually edited and so many new AI agents debuting, it can be difficult to maintain an up-to-date blocklist. “People just don’t know what to block,” says Gavin King, founder of Dark Visitors. Dark Visitors offers a freemium service that automatically updates a client site’s robots.txt, and King says publishers make up a large portion of his customers due to copyright concerns.

Robots.txt may seem like the obscure domain of webmasters, but given its outsized importance to digital publishers in the AI age, it’s now the domain of media executives. WIRED has learned that two CEOs of major media companies have direct say in which bots should be blocked.

Some media outlets have explicitly stated that they block AI scraping tools because they don’t currently have partnerships with their owners. “We block Applebot-Extended on all Vox Media properties, as we have done with many other AI scraping tools when we don’t have a commercial agreement with the other party,” said Lauren Starke, senior vice president of communications at Vox Media. “We believe in protecting the value of our published work.”

Major sites say no to Apple's AI scraping