Image generators like Stable Diffusion can create what looks like real photos or handcrafted artwork that represent just about anything a person can imagine. This is possible thanks to algorithms that learn to associate the properties of a huge collection of images from the Internet and image databases with the corresponding text labels. Algorithms learn to render new images to match a text prompt in a process of adding and removing random noise from an image.
Because tools like Stable Diffusion use images scraped from the internet, their training data often contains pornographic images, which allows the software to generate new sexually explicit images. Another concern is that such tools could be used to create images that appear to show that a real person is doing something compromising – something that could spread misinformation.
The quality of AI-generated images has skyrocketed over the past year and a half, starting with the January 2021 announcement of a system called DALL-E by AI research firm OpenAI. It popularized the text cue-based image generation model and was followed in April 2022 by a more powerful successor, DALL-E 2, now available as a commercial service.
From the start, OpenAI has limited who can access its image generators, providing access only through a prompt that filters what can be requested. The same goes for a competing service called Midjourney, which was released in July this year and popularized AI-created art by being widely accessible.
Stable Diffusion is not the first open source AI art generator. Not long after the original DALL-E was released, a developer built a clone called DALL-E Mini that was available to everyone and quickly became a meme-making phenomenon. DALL-E Mini, later renamed Craiyon, still includes guardrails similar to those found in the official versions of DALL-E. Clement Delangue, CEO of HuggingFace, a company that hosts many open source AI projects, including Stable Diffusion and Craiyon, says it would be problematic if the technology were controlled by only a few large companies.
“If you look at the long-term development of the technology, it’s actually better from a security perspective to make it more open, more collaborative, and more inclusive,” he says. Closed technology is harder for outside experts and the public to understand, he says, and better if outsiders can judge models for issues such as race, gender, or age bias; in addition, others cannot build on closed technology. On balance, he says, the benefits of open sourcing the technology outweigh the risks.
Delangue points out that social media companies could use Stable Diffusion to build their own tools for spotting AI-generated images used to spread misinformation. He says developers have also contributed a system for adding invisible watermarks to images created with Stable Diffusion to make them easier to trace, and built a tool to find certain images in the model’s training data, so problematic images can be deleted.
After taking an interest in Unstable Diffusion, Simpson-Edin became moderator of the Unstable Diffusion Discord. The server prohibits people from posting certain types of content, including images that could be interpreted as pornography for minors. “We can’t moderate what people do on their own machines, but we are very strict with what is posted,” she says. In the short term, mitigating the disruptive effects of AI art may depend more on humans than machines.