The chatbot that millions of people have used to write term papers, computer code and fairy tales doesn’t just do words. ChatGPT, OpenAI’s artificial intelligence-powered tool, can also analyze images – describe what’s on them, answer questions about images, and even recognize specific people’s faces. The hope is that eventually someone will be able to upload a photo of a broken car’s engine or a mysterious rash and that ChatGPT will be able to suggest the solution.
What OpenAI doesn’t want ChatGPT to become is a facial recognition engine.
In recent months, Jonathan Mosen has been one of a select group of people with access to an advanced version of the chatbot that can analyze images. On a recent trip, Mr. Mosen, a blind executive of an employment agency, used visual analysis to determine which dispensers in a hotel room bathroom had shampoo, conditioner, and shower gel. It was far beyond the performance of any image analysis software he had used in the past.
“It told me the milliliter capacity of each bottle. It told me about the tiles in the shower,’ Mr Mosen said. “It described all of this in a way that would take a blind person to hear. And with one photo I had exactly the answers I needed.”
For the first time, Mr. Mosen able to “interrogate images,” he said. He gave an example: text accompanying an image he came across on social media described it as a “woman with blond hair who looks happy”. When it asked ChatGPT to analyze the image, the chatbot said it was a woman in a dark blue shirt taking a selfie in a full-length mirror. He could ask follow-up questions, such as what kind of shoes she was wearing and what else was visible in the mirror’s reflection.
“It’s extraordinary,” says Mr Mosen, 54, who lives in Wellington, New Zealand, and has demonstrated the technology on a podcast he hosts about “living blind”.
In March, when OpenAI announced GPT-4, the latest software model powering its AI chatbot, the company said it was “multimodal,” meaning it could respond to text and image prompts. While most users could only talk to the bot in words, Mr. Mosen has early access to the visual analysis by Be My Eyes, a start-up that connects typically blind users with sighted volunteers and provides accessible customer service to corporate clients. Be My Eyes partnered with OpenAI this year to test the chatbot’s “sight” before releasing the feature to the general public.
Recently, the app stopped giving Mr Mosen information about people’s faces, saying they were hidden for privacy reasons. He was disappointed because he felt he should have the same access to information as a sighted person.
The change reflected OpenAI’s concern that it had built something with a power it didn’t want to release.
The company’s technology can primarily identify public figures, such as people with a Wikipedia page, said Sandhini Agarwal, an OpenAI policy researcher, but doesn’t work as extensively as tools built for finding faces on the Internet, such as those from Clearview AI and PimEyes. The tool can spot OpenAI CEO Sam Altman in photos, Ms Agarwal said, but not other people who work at the company.
Making such a feature public would push the boundaries of what was generally considered acceptable by US tech companies. It could also cause legal problems in jurisdictions, such as Illinois and Europe, that require companies to obtain citizens’ consent to use their biometric information, including a facial print.
In addition, OpenAI was concerned that the tool would say things it shouldn’t about people’s faces, such as judging their gender or emotional state. OpenAI is figuring out how to address these and other security issues before the image analytics feature is widely released, Ms Agarwal said.
“We really want this to be a two-way conversation with the public,” she said. “If what we’re hearing is like, ‘We don’t really want any of it,’ then we’re very into that.”
In addition to feedback from Be My Eyes users, the company’s non-profit division is also trying to figure out ways to get “democratic input” to help set rules for AI systems.
Ms Agarwal said the development of visual analysis was not “unexpected” as the model was trained by looking at images and text collected from the internet. She pointed out that facial recognition software for celebrities already existed, such as a tool from Google. Google offers an opt-out for famous people who don’t want to be recognized, and OpenAI is considering that approach.
Ms. Agarwal said OpenAI’s visual analysis could produce “hallucinations” similar to what had been seen with text prompts. “If you give it a picture of someone who’s about to become famous, it might hallucinate a name,” she said. “For example, if I give it a photo of a famous tech CEO, I might get the name of another tech CEO.”
The tool once inaccurately described a remote control to Mr. Mosen, confidently telling him it had buttons on it that weren’t there, he said.
Microsoft, which has invested $10 billion in OpenAI, also has access to the visual analytics tool. Some users of Microsoft’s AI-powered Bing chatbot have seen the feature appear in a limited rollout; after uploading images, they received a message stating that “privacy blur is hiding faces from Bing chat.”
Sayash Kapoor, a computer scientist and doctoral student at Princeton University, used the tool to decode a captcha, a visual security check that should be understandable only to human eyes. Even while breaking the code and recognizing the two hidden words, the chatbot noted that “captchas are designed to prevent automated bots like me from accessing certain websites or services.”
“AI just blows through all the things that are supposed to separate humans from machines,” said Ethan Mollick, an associate professor who studies innovation and entrepreneurship at the University of Pennsylvania’s Wharton School.
Since the visual analysis tool suddenly appeared last month in Mr. Mollick’s version of Bing’s chatbot — which, without any notice, made him one of the few people with early access — didn’t shut down his computer for fear of losing it. He gave it a picture of spices in a refrigerator and asked Bing to suggest recipes for those ingredients. It came with “whipped cream soda” and a “creamy jalapeño sauce”.
Both OpenAI and Microsoft seem to be aware of the power – and potential privacy implications – of this technology. A Microsoft spokesperson said the company “did not share any technical details” about the face blur, but was “working closely with our partners at OpenAI to uphold our shared commitment to the safe and responsible use of AI technologies.”