Skip to content

A startup's plan to solve AI's 'shoplifting' problem

    A startup's plan to solve AI's 'shoplifting' problem

    Bloomberg via Getty

    Bill Gross made his name in the tech world in the 1990s when he invented a new way for search engines to make money from advertising. Under his pricing scheme, advertisers would pay when people clicked on their ads. Now, the “pay-per-click” man has founded a startup called ProRata with a bold, possibly unrealistic business model: “AI pay-per-use.”

    Gross, CEO of the Pasadena, California-based company, doesn’t mince words when it comes to the generative AI industry. “It’s stealing,” he says. “They’re stealing and laundering the world’s knowledge for their own benefit.”

    AI companies often claim that they need vast amounts of data to create sophisticated generative tools, and that scraping data from the internet—whether it’s text from websites, videos or subtitles from YouTube, or books from pirate libraries—is legal. Gross doesn’t buy that argument. “I think it’s nonsense,” he says.

    So do many media executives, artists, writers, musicians and other rights holders who are fighting back. It’s hard to keep track of all the copyright lawsuits being filed against AI companies, who claim their practices amount to theft.

    But Gross thinks ProRata offers a solution that trumps legal battles. “To make it fair, that's what I'm trying to do,” he says. “I don't think this should be solved by litigation.”

    His company wants to set up revenue-sharing deals so that publishers and individuals get paid when AI companies use their work. As Gross explains it, “We can take the output of generative AI, whether it’s text or an image or music or a movie, and break it down into its component parts, figure out where it came from, and then give a percentage attribution to each copyright holder, and then pay them accordingly.” ProRata has filed patent applications for the algorithms it created to assign attribution and make appropriate payments.

    This week, the company, which has raised $25 million, launched with a number of major partners, including Universal Music Group, the Financial Times, The Atlantic and media company Axel Springer. It has also struck deals with writers with large followings, including Tony Robbins, Neal Postman and Scott Galloway. (It has also partnered with former White House communications director Anthony Scaramucci.)

    Even journalism professor Jeff Jarvis, who believes scraping the web for AI training is fair use, has joined in. He tells WIRED that it's smart for people in the news industry to work together to give AI companies access to “credible and timely information” to include in their output. “I hope ProRata can open up the conversation about what APIs can become [application programming interfaces] for different content,” he says.

    After the company’s initial announcement, Gross says he received a flood of messages from other companies asking to sign on, including a text from Time CEO Jessica Sibley. ProRata has struck a deal with Time, the publisher confirmed to WIRED. He plans to sign deals with well-known YouTubers and other individual online stars.

    The key word here is “plans.” The company is still in its early stages, and Gross is talking about a big game. As a proof of concept, ProRata is launching its own subscription chatbot-style search engine in October. Unlike other AI search products, ProRata’s search tool will use only licensed data. There will be no scraping using a web crawler. “Nothing from Reddit,” he says.

    Ed Newton-Rex, a former Stability AI executive who now runs ethical data licensing nonprofit Fairly Trained, is excited about ProRata’s debut. “It’s great to see a generative AI company licensing training data before releasing their model, unlike the approach taken by many other companies,” he says. “The deals they’ve done further demonstrate that media companies are open to working with good actors.”

    Gross wants the search engine to demonstrate that data quality is more important than quantity, and believes that limiting the model to reliable sources of information will curb hallucinations. “I argue that 70 million good documents are actually better than 70 billion bad documents,” he says. “It will lead to better answers.”

    What's more, Gross believes he can get enough people to sign up for this AI search engine with fully licensed data to make as much money as necessary to pay his data providers their allotted share. “Every month, the partners get a statement from us saying, 'Here's what people are searching for, here's how your content has been used, and here's your pro rata “Check,” he says.

    Other startups are already vying for prominence in this new world of training data licensing, including marketplaces TollBit and Human Native AI. A nonprofit called the Dataset Providers Alliance was founded earlier this summer to advocate for more standards in licensing; founders include services like the Global Copyright Exchange and Datarade.

    ProRata’s business model is based in part on a plan to license its attribution and payments technologies to other companies, including big AI players. Some of those companies have begun to strike their own deals with publishers. (The Atlantic and Axel Springer, for example, have agreements with OpenAI.) Gross hopes that AI companies will find ProRata’s models more affordable than creating them themselves.

    “I'll license the system to anyone who wants to use it,” Gross says. “I want to make it so cheap that it's like a Visa or MasterCard fee.”

    This story originally appeared on wired.com.