What does it really mean for a machine to be intelligent?
In 1950, Alan Turing proposed his famous Imitation Game as a test for machine intelligence. The test was simple: could a computer, through text alone, fool a human into believing it was also human? For decades, this remained a distant goal. Today, large language models (LLMs) have, by many measures, passed this test. ChatGPT can effortlessly write a sonnet on demand, a task Turing himself proposed in his original paper.
But does that make it intelligent?
We present our take on this question in the forthcoming book chapter in the book Designing an Intelligence, edited by George Konidaris. We argue that despite this incredible linguistic fluency, something profound is still missing. While LLMs can manipulate language with superhuman skill, we believe that they are not truly intelligent. They are more like incredibly sophisticated "calculators" for words. Turing’s core idea—that disembodied language is a sufficient test for intelligence—is false.
The Elephant in the Room: Embodiment
As robotics pioneer Rodney Brooks once noted, elephants don't play chess. They don’t write sonnets, either. Yet, we all agree that an elephant is an intelligent creature. It has goals, it makes plans, and it engages in complex, goal-directed behavior in the physical world.
Consider the crow that painstakingly drops rocks into a tube of water to raise the level high enough to drink. This is a clear display of intelligence—understanding cause and effect, interacting with the world, and taking action to achieve a goal, all without a word of human language.
This is the missing piece: embodiment. True intelligence must be a computational agent that is embedded in space and time, with high-dimensional sensors to perceive the world and high-dimensional motors to act within it. Intelligence can't just be about processing the internet; it has to be about processing the world.
A New Benchmark: The Grounded Turing Test
If the original Turing Test is no longer sufficient, what should replace it? We propose a new benchmark: the Grounded Turing Test.
To pass this test, an AI must be embodied in a robot that can use language in a way that is fundamentally grounded in the physical world. It’s not enough to just talk. The robot’s success or failure is defined by its physical and behavioral response to language. The test requires a fluid, collaborative dialogue where the robot demonstrates a deep connection between words, perception, and action.
What would this look like in practice? The Grounded Turing Test is made up of a whole suite of linguistic capabilities. Here are just a few examples:
Interpreting Instructions: You could tell the robot, "Pick up the red block," and it would need to perceive the block and physically pick it up.
Understanding the World: You could state, "The red block is on the table," and the robot should update its internal model of the world, so that it can use that information later to find the block.
Asking for Help: If the robot can't reach the block, it should be able to ask you, "Can you give me the red block?”
Explaining its Actions: If you ask, "Why did you drive to the table?" it should be able to explain its reasoning: "You want me to pick up the red block, and you told me that the red block is on the table."
The Path to Truly Intelligent Robots
Building a system that can pass the Grounded Turing Test is the grand research challenge for our field. It requires us to move beyond static, pre-collected datasets and develop AI that can learn continuously from a real-time stream of high-dimensional sensory input. In the chapter, we outline a technical roadmap for achieving this goal, proposing a unified framework that we call the Human-Robot Collaborative POMDP. This framework models the physical world, the human’s mental state, and the robot’s actions within a single decision-theoretic model.
Ultimately, the quest for AI is not about creating a better chatbot. It's about understanding the nature of intelligence itself. The beauty of language isn't in the words themselves, but in the high-dimensional sensory inputs collected over time through active interaction with the physical world that those words represent.
We imagine a future not where robots replace us, but where they become our collaborators, augmenting our own abilities and making us more productive as a species. This journey starts with setting the right goal—a benchmark that captures the rich, embodied, and interactive nature of true intelligence.
We hope you'll join the conversation and look for Designing an Intelligence when it arrives in 2026.
Your post reminded me of an Oliver Brock paper "The Work Turing Test," with some additional ideas about organizing the test to address a breadth of tasks. (It isn't easy to find, but Oliver is promising to ArXiv it.)
One difference is the connection to language that you propose. Here is an issue I am curious about. I would say that our conscious processes have only a tenuous grasp of what our subconscious physical intelligence (the "inner robot"?) is doing. If you ask somebody how they do something, they are likely to expose a limited understanding. I am surprised, when somebody gives a simplistic answer, the inner robot doesn't smack its forehead in frustration. But, I guess the inner robot doesn't understand the answer. Anyway, since your proposal straddles language and physical intelligence, maybe it would shed some light on the gap between them.
The argument you are making - that disembodied intelligent use of language is not 'true' intelligence - has also been made repeatedly by Yann LeCun, the former head of Meta's AI effort. As it happens, in two recent posts on my blog, I asked a couple of AI chatbots what they thought of this argument.
Both Kimi K2 and ChatGPT-5 gave a pretty good demonstration that they understood the issues. And I think that they both provided a nuanced answer that physical grounding of language may be nice when discussing block worlds or cats on mats, but that it is not strictly necessary when discussing ideas. I report verbatim the responses of these two chatbots to the question, but Claude, Gemini, DeepSeek, and ChatGPT 4.5 all also gave quite good responses to this same question, as well as some others.