You Probably Underestimate AI Chatbots

In spring In 2007, I was one of four journalists hired by Steve Jobs to review the iPhone. This was probably the most anticipated product in the history of technology. How would it be? Was it a turning point for devices? Looking back at my review today, I’m relieved to say it’s no shame: I recognized the device’s generational significance. But for all the praise I’ve heaped on the iPhone, I couldn’t have anticipated the stunning secondary effects, such as the volcanic amalgamation of hardware, operating system, and apps, or the mesmerizing effect on our attention. (I urged Apple to “encourage outside developers to create new applications” for the device.) Nor did I suggest that we should expect the rise of services like Uber or TikTok or make any prediction that family dinners would turn into common display-centric trances. . My primary job, of course, was to help people decide if they should spend $500, which was super expensive for a phone at the time, to buy the damn thing. But reading the review now, you might wonder why I spent time complaining about AT&T’s network or the web browser’s inability to handle Flash content. That’s like bickering over which sandals to wear just as a three-story tsunami is about to break.

I am reminded of my lack of foresight when I read about people’s experiences with recent AI apps, such as chatbots with large language models and AI image generators. People are rightly obsessed with the impact of a sudden rush of shockingly capable AI systems, though scientists often note that these seemingly rapid breakthroughs have been decades in the making. But just like when I first got my hands on the iPhone in 2007, we risk not anticipating the potential trajectories of our AI-inspired future by focusing too much on current versions of products like Microsoft’s Bing chat. , OpenAI’s ChatGPT, Anthropic’s Claude and Google’s Bard.

This misconception can be clearly observed in what has become a new and popular media genre, best described as prompt-and-pronounce. The modus operandi is to perform a task previously limited to humans and then push it to the limit, often disregarding the caveats of the inventors. The great sportswriter Red Smith once said that writing a column is easy: you just open a vein and bleed. But aspiring experts are now promoting a bloodless version: You just open a browser and ask. (Note: This newsletter was made the old-fashioned way, by opening a vein.)

Typically, prompt-and-pronounce columns involve sitting down with one of these early systems and seeing how well it replaces something previously confined to the human realm. In a typical example, a New York Times reporter used ChatGPT to answer all her work communications for an entire week. The Wall Street JournalThe product reviewer of ‘s decided her vote (hey, we did that first!) There are dozens of similar examples.

In general, those who perform such stunts come to two conclusions: these models are amazing, but they fall woefully short of what humans do best. The emails don’t pick up on the workplace nuances. The clones drag one foot in the uncanny valley. Most damningly, these text generators make things up when asked for factual information, a phenomenon known as “hallucinations,” which is the current curse of AI. And it’s a plain fact that the output of today’s models often has a soulless quality to it.

In a sense, it’s frightening: Will our future world be run by flawed “mind children,” as roboticist Hans Moravec calls our digital successors? But in another sense, its shortcomings are comforting. Sure, AIs can now perform many low-level tasks and are unparalleled at suggesting plausible-looking Disneyland trips and gluten-free dinner menus, but the bots will always need us to make corrections and the prose.

You probably underestimate AI chatbots