Skip to content

This copyright lawsuit could shape the future of generative AI

    The tech industry perhaps reeling from a wave of layoffs, a dramatic crypto crash, and ongoing turmoil on Twitter, but despite those clouds, some investors and entrepreneurs are already eyeing a new boom — built on artificial intelligence that delivers coherent text, compelling graphics, and functional computer code. But that new frontier has its own looming cloud.

    A class action lawsuit filed this month in a California federal court targets GitHub Copilot, a powerful tool that automatically writes working code when a programmer starts typing. The coder behind the lawsuit claims that GitHub is infringing copyright because it does not provide attribution when Copilot reproduces open-source code under a license that requires it.

    The lawsuit is in its early stages and its prospects are unclear because the underlying technology is new and has not yet undergone much legal scrutiny. But legal experts say it could impact the broader trend of generative AI tools. AI programs that generate paintings, photos, and illustrations from a prompt, as well as text for marketing copy, are all built with algorithms trained on previous human-created work.

    Visual artists were the first to question the legality and ethics of AI incorporating existing work. Some people who make a living from their visual creativity are angry that AI art tools trained on their job can produce new images in the same style. The Recording Industry Association of America, a music industry group, has indicated that AI-powered music generation and remixing could be a new area of ​​copyright concern.

    “This whole arc that we’re seeing now — this generative AI space — what does it mean for these new products to suck up the work of these creators?” said Matthew Butterick, a designer, programmer and attorney who filed the lawsuit against GitHub.

    Copilot is a powerful example of the creative and commercial potential of generative AI technology. The tool is created by GitHub, a subsidiary of Microsoft that hosts the code for hundreds of millions of software projects. GitHub created it by training an algorithm designed to generate code from AI startup OpenAI on the massive collection of code it stores, creating a system that can preemptively complete large chunks of code after a programmer makes a few keystrokes. A recent study by GitHub suggests that programmers can complete some tasks in less than half the time it normally takes when using Copilot as a tool.

    But if some programmers quickly noticed, Copilot will occasionally reproduce recognizable code snippets drawn from the millions of lines in public code repositories. The lawsuit filed by Butterick and others accuses Microsoft, GitHub and OpenAI of copyright infringement because this code does not contain the attribution required by the open source licenses that cover that code.

    Programmers have, of course, always studied, learned from, and copied each other’s code. But not everyone is sure it’s fair for AI to do the same, especially if AI can then produce tons of valuable code on its own, without respecting the licensing requirements of the source material. “As a technologist, I’m a big fan of AI,” says Butterick. “I look forward to all the possibilities of these tools. But they have to be fair to everyone.”