The only exception to this is the UMG vs. Anthropic In this case, earlier versions of Anthropic would, at least early on, generate the lyrics for songs in the output. That's a problem. The current status of that case is that they have put safeguards in place to try to prevent that, and the parties have more or less agreed that, pending resolution of the case, those safeguards are sufficient, so they are not longer seeking a preliminary injunction.
Ultimately, that's not the more difficult question for the AI companies Is it legal to participate in a training? Are what do you do when your AI generates output that is too similar to a certain job?
Do you expect the majority of these cases to go to trial, or do you see settlements on the horizon?
There may be settlements. Where I expect settlements is with major players who have large amounts of content or content that is particularly valuable. The New York Times could end up with a settlement and a licensing deal, perhaps with OpenAI paying money to use the New York Times content.
There's enough money at stake that we'll probably get at least some rulings that set the parameters. I feel like the class action plaintiffs have stars in their eyes. There are many class actions, and I suspect the defendants will oppose them and hope to win on summary judgment. It is not self-evident that they will appear in court. The Supreme Court in the Google v Oracle case pushed fair-use law very strongly towards a solution based on summary proceedings, and not before a jury. I think the AI companies will do their best to have these cases decided on the basis of summary proceedings.
Why would it be better for them to win on summary judgment versus a jury verdict?
It is faster and cheaper than filing a lawsuit. And AI companies worry that they won't be seen as popular, and that many people will think: Oh, you made a copy of the work that should be illegal and not addressing the details of the fair use doctrine.
There have been many deals between AI companies and media outlets, content providers and other rights holders. Most of the time these deals seem to be more about search than fundamental models, or at least that's how it's described to me. In your opinion, is it legally required to use licensed content in AI search engines (where answers are obtained via Retrieval Augmented Generation or RAG)? Why do they do it this way?
If you use retrieval augmented generation for targeted, specific content, your fair-use argument becomes more challenging. AI-generated queries are much more likely to generate text pulled directly from a given source in the output, and much less likely to constitute fair use. I mean, it could be, but the risky area is that it's much more likely to compete with the original source material. If instead of directing people to a New York Times story, I give my AI prompt that uses RAG to pull the text directly from that New York Times story, that seems like a replacement that the New York Times could harm. The legal risk is greater for the AI company.
What do you want people to know about the generative AI copyright battles that they might not know yet, or might be misinformed about?
The thing I hear most often that is technically wrong is the idea that these are just plagiarism machines. All they do is take my stuff and then work it out in the form of text and responses. I hear a lot of artists say that, and I hear a lot of laypeople say that, and technically it's just not true. You can decide whether generative AI is good or bad. You can decide whether it is legal or illegal. But it is really something fundamentally new that we have not experienced before. The fact that you have to practice with a lot of content to understand how sentences work, how arguments work, and to understand different facts about the world doesn't mean you just copy and paste things or make a collage. It really generates things that no one could expect or predict, and it gives us a lot of new content. I find that important and valuable.