How To Prevent ChatGPT From Going Off The Rails

When WIRED requested it leading me to write this week’s newsletter, my first instinct was to ask ChatGPT – OpenAI’s viral chatbot – what it came up with. That’s what I’ve been doing all week with emails, recipes, and LinkedIn posts. Productivity is way down, but cheeky limericks about Elon Musk are up 1000 percent.

I asked the bot to write a column about itself in the style of Steven Levy, but the results weren’t great. ChatGPT provided generic commentary about the promise and pitfalls of AI, but didn’t really capture Steven’s voice or say anything new. As I wrote last week, it was fluid, but not entirely convincing. But it did get me thinking, would I have gotten away with it? And what systems might catch people using AI for things they shouldn’t be doing, whether that’s business emails or college essays?

To find out, I spoke to Sandra Wachter, a professor of technology and regulation at the Oxford Internet Institute, who speaks eloquently about building transparency and accountability into algorithms. I asked her what that might look like for a system like ChatGPT.

Amit Katwala: ChatGPT can write anything from classic poetry to boring marketing copy, but a major point of discussion this week was whether it could help students cheat. Do you think you could tell if any of your students had used it to write a paper?

Sandra Wachter: This is going to be a game of cat and mouse. The technology may not yet be good enough to fool me as a person who teaches law, but it may be good enough to convince someone who is not in that field. I wonder if the technology will get so good over time that it can mislead me too. We may need technical tools to ensure that what we see is man-made, just as we have tools for deepfakes and detecting edited photos.

That seems inherently more difficult to do for text than for deeply distorted images, because there are fewer artifacts and telltale signs. Perhaps any reliable solution should be built by the company that generates the text in the first place.

You do need to have buy-in from whoever makes that tool. But if I offer services to students, I may not be the type of company to submit to it. And there may be a situation where even if you put watermarks, they can be removed. Highly tech-savvy groups are likely to find a way. But there is a real technical tool [built with OpenAI’s input] which allows you to detect if output is artificially created.

What would a version of ChatGPT designed with damage control in mind look like?

A couple of things. Firstly, I would really argue that whoever makes these tools put watermarks in place. And maybe the EU’s proposed AI law can help, as it deals with transparency around bots, saying you should always be aware if something isn’t real. But maybe companies don’t want that, and maybe the watermarks could be removed. So it’s about encouraging research into independent tools that look at AI output. And in education, we need to be more creative in how we assess students and how we write papers: what kinds of questions can we ask that are less easy to falsify? It has to be a combination of technical and human oversight that helps us curb the disruption.

How to prevent ChatGPT from going off the rails