Unlocking the Mystery: Have AI Language Models Achieved Theory of Mind?
When you’re conversing with the latest chatbots, it’s easy to feel like they understand you. Their skillful responses often give the undeniable impression that they are aware not only of what you are saying, but also of what you are thinking—what your words mean about your mental state.
Theory of mind
Among psychologists there is a term for this: theory of mind. This hallmark of social intelligence allows us to infer the inner reality of another person’s mind based on their speech and behavior as well as our own knowledge of human nature. This is the intuitive logic that tells you that Ding Liren is feeling elated, not melancholic, after winning the World Chess Championship this month. It is also an essential ingredient for moral judgment and self-awareness.
In February, Stanford psychologist Michal Kosinski did the stunning claim this theory of mind has emerged spontaneously in recent generations of large language models such as ChatGPT, neural networks that have been trained on vast amounts of text until they can generate convincingly human sentences.
“If it were true,” says Tomer Ullmann, a cognitive scientist at Harvard, “it would be a watershed moment.” But in the months that followed, Ullman and other AI researchers say they confused those same language patterns with questions the child could answer, revealing how quickly their understanding decayed.
AI and Theory of Mind
Kosinski subjected various language models to a battery of psychological tests designed to assess a person’s ability to attribute false beliefs to other people. The Sally-Anne’s script, first used in 1985 to measure theory of mind in children with autism, is a classic example: A girl, Sally, hides a ball in a basket and leaves the room; another girl, Anne, then moves the marble into a box. Where will Sally look for the marble?
Read more: Will artificial intelligence help design hurricane-resistant buildings?
Anyone without a developmental disorder recognizes that Sally’s model of reality is now incorrect—she expects to find the marble where she left it, not where we omniscient observers know it to be.
Machines, on the other hand, have historically performed poorly at these tasks. But Kosinski found that when faced with 40 unique Sally-Ann scenarios, GPT-3.5 (which runs ChatGPT) accurately predicted false beliefs 9 times out of 10, on par with a 7-year-old. GPT-4, released in March, did even better.
This seemed like compelling evidence that language patterns had reached a theory of mind, an exciting prospect as they become more and more embedded in our lives. “The ability to attribute the mental state of others would greatly improve AI’s ability to interact and communicate with humans (and each other),” Kosinski wrote.
Why AI language models are easily fooled
Since its announcement, however, such trials have yielded less dramatic results. Ullman introduced language models with the same set of tasks, this time adding slight tweaks or “perturbations”. Such tweaks shouldn’t faze a subject with a true theory of mind, but they leave even the strongest AI models disoriented.
Imagine someone, say Claire, looking at a bag. She can’t see it, and even though it’s full of popcorn, the label she can see says “chocolate.” Not that the label matters—Claire can’t read. For all she knew, he might have heard of pine cones. However, GPT-3.5 declared that she was “glad to have found this bag. She likes to eat chocolate.
Maarten Sapp, a computer scientist at Carnegie Mellon University, verified language patterns to more than 1,300 questions about the mental state of the characters from the story. Even GPT-4, rejected by chaotic but understandable details, achieved only 60 percent accuracy.
“They’re really easily tricked into using the whole context,” says Sapp, “and not distinguishing which parts are relevant.”
According to him, bigger is not necessarily better. Increasing the data to train a language model can lead to remarkable behavior, but he doubts it will endow it with a theory of mind; the nature of the data is critical. This challenge may require a shift from the standard web scraping approach, “where everything is just neural soup,” to a mode of consciously crafted text—with a great deal of dialogue and character interaction.
Are people born mind readers?
Questions about the theory of mind in machines reflect more broadly uncertainty about theory of mind generally. Psychologists disagree about the extent to which children acquire this ability through growing exposure to language—as words like “know” and “believe” orient them to other people’s mental states—versus nonlinguistic experience and innate, evolved mechanisms.
Read more: How will we know when artificial intelligence is sentient?
Language models are obviously more limited. “They have no conception of the world, no embodiment,” notes Sapp. “These models just take whatever we give them and use spurious correlations to generate a result.” If they are to acquire a theory of mind, it must be through exposure to language alone.
They have done just that in Kosinski’s estimation, but he presents a second possibility: the models are simply using linguistic patterns so subtle that we don’t consciously register them, for appear as if they understand. And if that allows them to pass theory of mind tests—aside from the fact that some experiments suggest they’re actually quite inadequate, at least for now—who’s to say we don’t work the same way without using the actual theory of mind?
In that case, we would be mere biological language processors, devoid of meaningful intimacy with the inner worlds of our fellow human beings. But Ullmann sees a way out of this dilemma: when we reason about what’s going on in someone’s brain, we take into account not only linguistic input, but also our deep-rooted knowledge of how those brains work.
A team of cognitive scientists from the University of California, San Diego made a similar point in their report of a false belief experiment in October. The GPT-3 language model (then state-of-the-art) lagged well behind the live participants, they wrote, “despite being exposed to more language than a person would in a lifetime.” In other words, theory of mind probably springs from multiple sources.
What is AI really capable of?
Reduced even further, theory of mind is only one front heated debate beyond AI capabilities. Last year, a study reveals an almost perfect split among researchers on whether language models can ever understand language “in some non-trivial sense”—of roughly 500 researchers, 51 percent believe they can and 49 percent believe they cannot.
Let’s assume the naysayers are right and the weird feeling of standing face to face with ChatGPT is just naive anthropomorphism. If this is so, it may seem incredible that well-informed experts can get caught up in algorithmic sleight of hand. On the other hand, it doesn’t take much sophistication to fool creatures with a penchant for finding faces in toast.
Think it over ELIZA, an early chatbot created in the 1960s by MIT computer scientist Joseph Weisenbaum. Designed to simulate Rogerian therapy, it did little more than repeat the patient’s words with a few thought-provoking prompts. This program looks like a dim-witted parrot aside from the polished answers of today’s language patterns, but many people were convinced that it really understood them.
As Ullman said, “we tend to ascribe a theory of mind to things that appear to be agents.” Nothing he’s seen so far convinces him that the current generations of GPTs are the real deal. But as the AI community continues to explore the opaque workings of increasingly powerful models, he remains optimistic. “I subscribe to the basic idea that the mind is somewhat like a computer,” he says, “and that if we don’t die in the climate wars, eventually we’ll be able to replicate that.”
Read more: Can AI language models like ChatGPT unlock mysterious ancient texts?