Your post reminded me of an Oliver Brock paper "The Work Turing Test," with some additional ideas about organizing the test to address a breadth of tasks. (It isn't easy to find, but Oliver is promising to ArXiv it.)
One difference is the connection to language that you propose. Here is an issue I am curious about. I would say that our conscious processes have only a tenuous grasp of what our subconscious physical intelligence (the "inner robot"?) is doing. If you ask somebody how they do something, they are likely to expose a limited understanding. I am surprised, when somebody gives a simplistic answer, the inner robot doesn't smack its forehead in frustration. But, I guess the inner robot doesn't understand the answer. Anyway, since your proposal straddles language and physical intelligence, maybe it would shed some light on the gap between them.
Would love to read his paper! We cited other variants of the Turing Test of our chapter, but if he hasn't posted his, then I feel less bad missing it. I agree some of what our inner robot is doing is not accessible to our conscious language mind. But I believe a surprising amount is. As a recent example, you recently coined the term "ulnar grasp" and now you see ulnar grasps everywhere. I think many (but not all) things our inner robot is doing are like that - we can create and define a language to talk about them and bring them into our consciousness.
The argument you are making - that disembodied intelligent use of language is not 'true' intelligence - has also been made repeatedly by Yann LeCun, the former head of Meta's AI effort. As it happens, in two recent posts on my blog, I asked a couple of AI chatbots what they thought of this argument.
Both Kimi K2 and ChatGPT-5 gave a pretty good demonstration that they understood the issues. And I think that they both provided a nuanced answer that physical grounding of language may be nice when discussing block worlds or cats on mats, but that it is not strictly necessary when discussing ideas. I report verbatim the responses of these two chatbots to the question, but Claude, Gemini, DeepSeek, and ChatGPT 4.5 all also gave quite good responses to this same question, as well as some others.
Which post? I couldn't tell from the titles. I agree that it's not strictly necessary when discussing ideas. And that's something that really surprised me - I would have bet against it, and I would have been wrong. But I think there is a HUGE gap between discussing ideas and going out and building them in the physical world. Maybe we could even quantify it in a theory of computation sense, in terms of the dimensionality of the input space. Words coming in as tokens vs images coming in at camera frame rates. Words coming out as tokens vs actions coming out at 100hz for minutes or hours or days.
Sorry. The two posts I was referring to were titled "Grok 4 and Kimi K2", and "ChatGPT 5 Drops". Search for the string "LeCun" in each to find the passages I meant.
I think you are right, though, to point out that embodied human intelligence routinely deals with input bandwidths far higher than what a chatbot has to deal with. But, ironically, that is precisely why it is the white-collar jobs that AI is currently threatening, rather than the higher bandwidth blue-collar jobs!
Very cool, I took a look. So I think there is a difference between text about physical grounding of language, and actually physically grounding language to lead to goal-directed behavior in the physical world. To riff on one of the examples Kimi gave, if someone says "this coffee is cold" to a waiter, the waiter is supposed to infer that coffee is supposed to be hot, and then go to the kitchen to get hot coffee. Carrying out these actions in the physical world requires processing high-dimensional high-framerate perceptual input such as camera and producing high-dimensional high-framerate movements to go to the kitchen, fill a mug, and bring it back to the customer full of hot coffee.
AI would certainly be more useful if it could use language and reasoning in a way that's grounded in the physical world. But I'm not sure I agree that something like the red block test you propose is essential to establish its being intelligent in a way that really does justice to what we mean to by the word. I think I might be satisfied that it was an intelligent being if it were possible to teach it something in more or less the same way that one would teach a person.
For instance, suppose there were a game somewhat like chess that the AI had never played, and that was not mentioned at all in its training data. Would it be possible to teach it to be a good player just by talking to it? You tell it the rules, ask if it has any questions about how the game works, then you play a couple games. Then you start talking about strategy. So you tell it, "your blue pieces give you power, but the red pieces give you flexibility, ways of surprising your opponent. It's not too bad to have a bit fewer red pieces than your opponent, but it's very important to have at least as many blue pieces as he does." So then you set up the board in a way the gives the AI a choice to become one down in red or one down in blue, and ask it what its next move would be. Maybe a while later you show it one clever trick you can play with blue , then ask if it can think of another one. Later, you tell it about styles of play: Some opponents move in slow, subtle ways towards dominating the board, others push grimly towards the endgame. You give it an example of each style of play. Then you ask it to play a game against you playing in the slow, subtle style, and another in grim-charge-towards-the-endgame style. And so on.
Current AI's intelligence is mostly a giant mnemonic trick: It's learned by seeing a milliion examples of everything. And of course it could learn to excel at the imaginary game with red and blue pieces in the same way. But if it could learn by instruction, of the kind that works with people, I think that would also satisfy me that it was genuinely intelligent.
The red block test is meant as an illustrative example; I don't think any "point solution" that handled just those examples would pass our test. We just used it as the flavor of the sorts of language use we were getting at.
Your idea about teaching a game with language reminds me of Branavan's work on learning to play Civilization 2 better by reading the instruction manual (Branavan, S. R. K., David Silver, and Regina Barzilay. "Learning to win by reading manuals in a monte-carlo framework." Journal of Artificial Intelligence Research 43 (2012): 661-704.) and even older work by Chapman and Agre (Agre, Philip E., and David Chapman. "What are plans for?." Robotics and autonomous systems 6.1-2 (1990): 17-34.). I suspect that LLMs can do this or nearly do this now in "text space", and for me it doesn't pass our bar. We're planning a post that gets at this using chess as an example. There are many different perceptual ways a chess game state can look but they all boil down to the same low-dimensional state representation.
I understand that your red ball examples are just a stand-in for a full test — just a sample of the things you would ask the AI to do.
You are right that chess and similar games are low-dimensional worlds. There are only a few things to know — which pieces are on the board, and what row and column each is in. But it seems like it would be possible to teach an instructable AI all kinds of things by starting the instruction with low-dimensional worlds. For instance you could teach it to recognize and move a red ball by showing it a low-dimensional version of the situation — a cartoon where the ball is a red circle, the surface it’s on is a brown rectangle, etc., then once it understands that, add more dimensions to the info by using photos, explaining that they show the real-world equivalents of what the cartoon showed. So instructability seems like a route by which AI could become intelligent in your sense, as well.
Whether it is possible to make AI more instructable I do not know. Somehow implanting a big picture understanding of things and the rules that govern them
is the approach that was originally used in attempts to build AI, and failed. (Is that what the articles you cite are about?). But whether it is possible or not I continue both to think and to feel intuitively that being instructable in my sense is the best test of what we mean by intelligence.
< I suspect that LLMs can do this or nearly do this now in "text space"
Really? I asked GPT 4.o whether current AI could do this and this is what it said.
<Your thought experiment about teaching an AI a complex game like chess purely through human-like instruction (rules, examples, strategic advice, styles of play) gets at a central question in current AI capabilities: Can modern AI models learn abstract structure and develop competence through explanation and interaction, rather than brute-force data exposure? Bottom Line
No current AI can become a highly skilled player of a complex game solely through human-style instruction and feedback.
But:
• It can simulate understanding well enough to follow rules and play passably.
• It can mimic strategic language and improve short-term behavior within a session.
• It gives a convincing illusion of learning — but it doesn’t actually develop skill in the way even a serious amateur human does.
But this is simulation of learning, not true cumulative learning:
• There's no stable model being updated under the hood.
• Any "learning" fades if you start a new session, unless you re-teach it.
• It doesn't self-correct in the way a human would ("Ah — I lost blue pieces early, and that always goes badly").
I understand your bias towards focusing on "Chess" as the "goal", but what we are talking about is creating AI that learn from high dimensional information to build base abstractions to build on. We would not suggest computer scientists start with these games which are complex in the base abstraction sense, but instead build a curriculum that naturally results in these abstractions being built. We need to understand that a child builds this translation over years of experience, and we can build systems that can create the same abstractions too.
Thank you for commenting! Welcome! Totally agree with both, especially mutual attention. We are doing some work in my lab with human-dog interaction as a model for human-robot interaction in collaboration with Daphna Buchsbaum (https://sites.brown.edu/cocodevlab/). One thing we observed from watching people and their dogs is that it's *all* about establishing joint attention.
Your post reminded me of an Oliver Brock paper "The Work Turing Test," with some additional ideas about organizing the test to address a breadth of tasks. (It isn't easy to find, but Oliver is promising to ArXiv it.)
One difference is the connection to language that you propose. Here is an issue I am curious about. I would say that our conscious processes have only a tenuous grasp of what our subconscious physical intelligence (the "inner robot"?) is doing. If you ask somebody how they do something, they are likely to expose a limited understanding. I am surprised, when somebody gives a simplistic answer, the inner robot doesn't smack its forehead in frustration. But, I guess the inner robot doesn't understand the answer. Anyway, since your proposal straddles language and physical intelligence, maybe it would shed some light on the gap between them.
Would love to read his paper! We cited other variants of the Turing Test of our chapter, but if he hasn't posted his, then I feel less bad missing it. I agree some of what our inner robot is doing is not accessible to our conscious language mind. But I believe a surprising amount is. As a recent example, you recently coined the term "ulnar grasp" and now you see ulnar grasps everywhere. I think many (but not all) things our inner robot is doing are like that - we can create and define a language to talk about them and bring them into our consciousness.
The argument you are making - that disembodied intelligent use of language is not 'true' intelligence - has also been made repeatedly by Yann LeCun, the former head of Meta's AI effort. As it happens, in two recent posts on my blog, I asked a couple of AI chatbots what they thought of this argument.
Both Kimi K2 and ChatGPT-5 gave a pretty good demonstration that they understood the issues. And I think that they both provided a nuanced answer that physical grounding of language may be nice when discussing block worlds or cats on mats, but that it is not strictly necessary when discussing ideas. I report verbatim the responses of these two chatbots to the question, but Claude, Gemini, DeepSeek, and ChatGPT 4.5 all also gave quite good responses to this same question, as well as some others.
Which post? I couldn't tell from the titles. I agree that it's not strictly necessary when discussing ideas. And that's something that really surprised me - I would have bet against it, and I would have been wrong. But I think there is a HUGE gap between discussing ideas and going out and building them in the physical world. Maybe we could even quantify it in a theory of computation sense, in terms of the dimensionality of the input space. Words coming in as tokens vs images coming in at camera frame rates. Words coming out as tokens vs actions coming out at 100hz for minutes or hours or days.
Sorry. The two posts I was referring to were titled "Grok 4 and Kimi K2", and "ChatGPT 5 Drops". Search for the string "LeCun" in each to find the passages I meant.
I think you are right, though, to point out that embodied human intelligence routinely deals with input bandwidths far higher than what a chatbot has to deal with. But, ironically, that is precisely why it is the white-collar jobs that AI is currently threatening, rather than the higher bandwidth blue-collar jobs!
Very cool, I took a look. So I think there is a difference between text about physical grounding of language, and actually physically grounding language to lead to goal-directed behavior in the physical world. To riff on one of the examples Kimi gave, if someone says "this coffee is cold" to a waiter, the waiter is supposed to infer that coffee is supposed to be hot, and then go to the kitchen to get hot coffee. Carrying out these actions in the physical world requires processing high-dimensional high-framerate perceptual input such as camera and producing high-dimensional high-framerate movements to go to the kitchen, fill a mug, and bring it back to the customer full of hot coffee.
AI would certainly be more useful if it could use language and reasoning in a way that's grounded in the physical world. But I'm not sure I agree that something like the red block test you propose is essential to establish its being intelligent in a way that really does justice to what we mean to by the word. I think I might be satisfied that it was an intelligent being if it were possible to teach it something in more or less the same way that one would teach a person.
For instance, suppose there were a game somewhat like chess that the AI had never played, and that was not mentioned at all in its training data. Would it be possible to teach it to be a good player just by talking to it? You tell it the rules, ask if it has any questions about how the game works, then you play a couple games. Then you start talking about strategy. So you tell it, "your blue pieces give you power, but the red pieces give you flexibility, ways of surprising your opponent. It's not too bad to have a bit fewer red pieces than your opponent, but it's very important to have at least as many blue pieces as he does." So then you set up the board in a way the gives the AI a choice to become one down in red or one down in blue, and ask it what its next move would be. Maybe a while later you show it one clever trick you can play with blue , then ask if it can think of another one. Later, you tell it about styles of play: Some opponents move in slow, subtle ways towards dominating the board, others push grimly towards the endgame. You give it an example of each style of play. Then you ask it to play a game against you playing in the slow, subtle style, and another in grim-charge-towards-the-endgame style. And so on.
Current AI's intelligence is mostly a giant mnemonic trick: It's learned by seeing a milliion examples of everything. And of course it could learn to excel at the imaginary game with red and blue pieces in the same way. But if it could learn by instruction, of the kind that works with people, I think that would also satisfy me that it was genuinely intelligent.
The red block test is meant as an illustrative example; I don't think any "point solution" that handled just those examples would pass our test. We just used it as the flavor of the sorts of language use we were getting at.
Your idea about teaching a game with language reminds me of Branavan's work on learning to play Civilization 2 better by reading the instruction manual (Branavan, S. R. K., David Silver, and Regina Barzilay. "Learning to win by reading manuals in a monte-carlo framework." Journal of Artificial Intelligence Research 43 (2012): 661-704.) and even older work by Chapman and Agre (Agre, Philip E., and David Chapman. "What are plans for?." Robotics and autonomous systems 6.1-2 (1990): 17-34.). I suspect that LLMs can do this or nearly do this now in "text space", and for me it doesn't pass our bar. We're planning a post that gets at this using chess as an example. There are many different perceptual ways a chess game state can look but they all boil down to the same low-dimensional state representation.
I understand that your red ball examples are just a stand-in for a full test — just a sample of the things you would ask the AI to do.
You are right that chess and similar games are low-dimensional worlds. There are only a few things to know — which pieces are on the board, and what row and column each is in. But it seems like it would be possible to teach an instructable AI all kinds of things by starting the instruction with low-dimensional worlds. For instance you could teach it to recognize and move a red ball by showing it a low-dimensional version of the situation — a cartoon where the ball is a red circle, the surface it’s on is a brown rectangle, etc., then once it understands that, add more dimensions to the info by using photos, explaining that they show the real-world equivalents of what the cartoon showed. So instructability seems like a route by which AI could become intelligent in your sense, as well.
Whether it is possible to make AI more instructable I do not know. Somehow implanting a big picture understanding of things and the rules that govern them
is the approach that was originally used in attempts to build AI, and failed. (Is that what the articles you cite are about?). But whether it is possible or not I continue both to think and to feel intuitively that being instructable in my sense is the best test of what we mean by intelligence.
< I suspect that LLMs can do this or nearly do this now in "text space"
Really? I asked GPT 4.o whether current AI could do this and this is what it said.
<Your thought experiment about teaching an AI a complex game like chess purely through human-like instruction (rules, examples, strategic advice, styles of play) gets at a central question in current AI capabilities: Can modern AI models learn abstract structure and develop competence through explanation and interaction, rather than brute-force data exposure? Bottom Line
No current AI can become a highly skilled player of a complex game solely through human-style instruction and feedback.
But:
• It can simulate understanding well enough to follow rules and play passably.
• It can mimic strategic language and improve short-term behavior within a session.
• It gives a convincing illusion of learning — but it doesn’t actually develop skill in the way even a serious amateur human does.
But this is simulation of learning, not true cumulative learning:
• There's no stable model being updated under the hood.
• Any "learning" fades if you start a new session, unless you re-teach it.
• It doesn't self-correct in the way a human would ("Ah — I lost blue pieces early, and that always goes badly").
I understand your bias towards focusing on "Chess" as the "goal", but what we are talking about is creating AI that learn from high dimensional information to build base abstractions to build on. We would not suggest computer scientists start with these games which are complex in the base abstraction sense, but instead build a curriculum that naturally results in these abstractions being built. We need to understand that a child builds this translation over years of experience, and we can build systems that can create the same abstractions too.
I agree with this and would add sociability and mutual attention as ongoing in-the-world challenges.
Thank you for commenting! Welcome! Totally agree with both, especially mutual attention. We are doing some work in my lab with human-dog interaction as a model for human-robot interaction in collaboration with Daphna Buchsbaum (https://sites.brown.edu/cocodevlab/). One thing we observed from watching people and their dogs is that it's *all* about establishing joint attention.
Very cool! 🐶