<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[What to Tell the Robot]]></title><description><![CDATA[Robotics, Language, Space, and Time, written by Stefanie Tellex and David Watkins.]]></description><link>https://whattotelltherobot.com</link><image><url>https://substackcdn.com/image/fetch/$s_!0Mfu!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffee6e279-53b0-4949-804e-4f7aa106f40a_727x727.png</url><title>What to Tell the Robot</title><link>https://whattotelltherobot.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 18 May 2026 04:22:37 GMT</lastBuildDate><atom:link href="https://whattotelltherobot.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Stefanie Tellex and David Watkins]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[whattotelltherobot@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[whattotelltherobot@substack.com]]></itunes:email><itunes:name><![CDATA[Stefanie Tellex]]></itunes:name></itunes:owner><itunes:author><![CDATA[Stefanie Tellex]]></itunes:author><googleplay:owner><![CDATA[whattotelltherobot@substack.com]]></googleplay:owner><googleplay:email><![CDATA[whattotelltherobot@substack.com]]></googleplay:email><googleplay:author><![CDATA[Stefanie Tellex]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How to Build Safe AI (Without Making the AI Safe)]]></title><description><![CDATA[Lessons from aviation, manufacturing, and the 16-year-old driver]]></description><link>https://whattotelltherobot.com/p/how-to-build-safe-ai-without-making</link><guid isPermaLink="false">https://whattotelltherobot.com/p/how-to-build-safe-ai-without-making</guid><dc:creator><![CDATA[Stefanie Tellex]]></dc:creator><pubDate>Wed, 13 May 2026 16:27:56 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/bd7c780e-6718-40ce-aea4-9f33d1ce2c5c_185x66.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As agentic LLMs are widely deployed to assist with all sorts of tasks, it is critical to make sure they are as safe as a Boeing 747.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LZlz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8d21d65-f802-411c-929d-e3e5f66c6c75_185x66.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LZlz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8d21d65-f802-411c-929d-e3e5f66c6c75_185x66.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LZlz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8d21d65-f802-411c-929d-e3e5f66c6c75_185x66.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LZlz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8d21d65-f802-411c-929d-e3e5f66c6c75_185x66.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LZlz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8d21d65-f802-411c-929d-e3e5f66c6c75_185x66.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LZlz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8d21d65-f802-411c-929d-e3e5f66c6c75_185x66.jpeg" width="728" height="259.7189189189189" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a8d21d65-f802-411c-929d-e3e5f66c6c75_185x66.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:66,&quot;width&quot;:185,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;A frame of video taken immediately before a midair collision between a Piper PA-32R and a Eurocopter AS350 that occurred over the Hudson River in 2009.  Source: NBC News / MSNBC&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A frame of video taken immediately before a midair collision between a Piper PA-32R and a Eurocopter AS350 that occurred over the Hudson River in 2009.  Source: NBC News / MSNBC" title="A frame of video taken immediately before a midair collision between a Piper PA-32R and a Eurocopter AS350 that occurred over the Hudson River in 2009.  Source: NBC News / MSNBC" srcset="https://substackcdn.com/image/fetch/$s_!LZlz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8d21d65-f802-411c-929d-e3e5f66c6c75_185x66.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LZlz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8d21d65-f802-411c-929d-e3e5f66c6c75_185x66.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LZlz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8d21d65-f802-411c-929d-e3e5f66c6c75_185x66.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LZlz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8d21d65-f802-411c-929d-e3e5f66c6c75_185x66.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption">A frame of video taken immediately before a midair collision between a Piper PA-32R and a Eurocopter AS350 that occurred over the Hudson River in 2009.  Source: NBC News / MSNBC</figcaption></figure></div><p>Existing approaches focus on how to provide safety guarantees for the LLM itself, but this task is challenging and perhaps impossible. Think of an LLM as a lens. Depending on the direction, phase, and wavelength of the incoming light, it refracts differently. Like a viroid, and LLM is a <a href="https://whattotelltherobot.com/p/consciousoids">consciousoid</a>, and how it behaves depends on the light hitting it: the high-dimensional input from its interlocutor.  We do not yet have a closed-form model for this projection, so determining whether a given input produces a given output requires running an empirical test. The LLM&#8217;s high-dimensional input space, output space, and parameter space make it fundamentally hard to certify.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Instead, I propose a systems approach: treat the LLM as one component of a larger system. The LLM itself can never be safe, and in fact, we do not trust it at all. But the system it is part of can have safety guarantees by following practices in ISO standards for safety, security, and risk. These same practices enable us to have airplanes in the sky, cars on the road, and robots in factories.</p><p>The methodology is straightforward: safety-rated guardrails at the input/output boundaries of the system ensure safe operation. To be verifiable, say, to a one-in-100,000 probability of failure, as specified in ISO 61508, these guardrails must be low-dimensional relative to the LLM. For example, a safety-rated E-stop in a factory robot must promise with 10^-5 that it will actually stop the robot if depressed in order to be certified by <a href="https://en.wikipedia.org/wiki/T%C3%9CV">T&#220;V</a>. A simple example: a check that the system cannot write outside of a specified directory. As dimensionality increases, guardrails may become probabilistic, for example, a classifier with a probabilistic guarantee could validate a bash script before it executes.</p><p>The highest-dimensional and noisiest channel of all is the channel between the human and the LLM. Prompted by the LLM, the human could go off and do, well, anything, including quite horrible things. It is therefore not possible to have ISO-style safety guarantees around the human-LLM interaction. A more realistic way to think about this is to compare human-LLM interaction to human-human interaction, which has a wide spectrum. We tolerate this sort of risk in our society: cult leaders, abusers, and scammers exist and are sometimes successful. To combat this risk, we use a combination of interventions that maintain our societal structures: law, health care, education, and more. We use this approach today when we put a 16-year-old driver behind the wheel. The car itself is constructed and certified according to these safety standards. The 16-year-old is not, so we educate them, we make them have a teacher in the car, and eventually we let them drive. (And we tolerate car accidents as the leading cause of death in that age group).</p><p>Here is where the analogy to humans starts to break. The residual risk society tolerates from scammers and cult leaders is calibrated to human time constants. A scammer takes days or weeks to build trust with one victim. Our interventions, law, education, and social institutions evolved against that clock. LLMs run the same loop in seconds, in parallel, across millions of targets, exploiting the very mechanisms humanity has based its structures around. To name this precisely, consider the total societal harm from a class of agents:</p><p>Rtotal=Nrp(1-v)</p><p>Where N is the number of deployed agents, r is interactions per unit time per agent, p is the probability of a harmful outcome per interaction, and v is the verification coverage of our guardrails. For humans, N &#215; r is bounded by population and biology. For LLMs, N &#215; r is bounded instead by the inference throughput of available data centers, the bandwidth of the internet, and the context window that the model can be provided. Those bounds are rising fast, and none of them shares a ceiling with human biology. Even if p is lower for an LLM than for a skilled human scammer, Rtotal can exceed what our existing interventions were calibrated to absorb.</p><p>This is where LLM speed, the thing that creates the problem, also points toward the solution. Two things scale with the time budget per interaction: utility to the user, which falls as interactions get slower, and verification coverage, which rises as we get more time to check outputs. Model them as:</p><p>U(t)=e-t, v(t)=1-e-kt</p><p>Where &#945; is how much users lose per unit of added latency, and k is the verification efficiency. Net value per interaction is:</p><p>V(t)=U(t)v(t)=e-t(1-e-kt)</p><p>Maximizing V gives an optimal time budget:</p><p>t =(1/k)(1+k/)</p><p>This equation has the shape of a classic speed-accuracy tradeoff. The (1 &#8722; e^(&#8722;kt)) term is Wickelgren&#8217;s function from 1977, and the optimization of deliberation time against decaying utility has been solved in drift-diffusion models, in neuroscience, and in economics. The math is not new. The reframing is: k is the efficiency of a parallel verifier, not an individual&#8217;s evidence accumulation, and &#945; is society&#8217;s aggregate latency tolerance, not personal opportunity cost. The tradeoff psychology studied one decision at a time reappears as an engineering problem at the datacenter scale.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eoQm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45a6c003-fe94-4a92-b8c5-512a0698cfe9_1974x1203.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eoQm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45a6c003-fe94-4a92-b8c5-512a0698cfe9_1974x1203.png 424w, https://substackcdn.com/image/fetch/$s_!eoQm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45a6c003-fe94-4a92-b8c5-512a0698cfe9_1974x1203.png 848w, https://substackcdn.com/image/fetch/$s_!eoQm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45a6c003-fe94-4a92-b8c5-512a0698cfe9_1974x1203.png 1272w, https://substackcdn.com/image/fetch/$s_!eoQm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45a6c003-fe94-4a92-b8c5-512a0698cfe9_1974x1203.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eoQm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45a6c003-fe94-4a92-b8c5-512a0698cfe9_1974x1203.png" width="1456" height="887" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/45a6c003-fe94-4a92-b8c5-512a0698cfe9_1974x1203.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:887,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eoQm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45a6c003-fe94-4a92-b8c5-512a0698cfe9_1974x1203.png 424w, https://substackcdn.com/image/fetch/$s_!eoQm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45a6c003-fe94-4a92-b8c5-512a0698cfe9_1974x1203.png 848w, https://substackcdn.com/image/fetch/$s_!eoQm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45a6c003-fe94-4a92-b8c5-512a0698cfe9_1974x1203.png 1272w, https://substackcdn.com/image/fetch/$s_!eoQm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45a6c003-fe94-4a92-b8c5-512a0698cfe9_1974x1203.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The ratio k/&#945; is what determines whether slowing down is worth it. When k &#8810; &#945;, verifiers are too slow to matter, and the system should just optimize for speed. When k &#8811; &#945;, verifiers can do real work in the time budget, and the optimum shifts toward more verification. Humans cannot exploit this tradeoff because we do not have concurrent processes running checks on our own cognition at millisecond latency. LLMs can. While one process drafts an output, another simulates consequences, validates against policy, or cross-checks with a second model. Better verifiers, meaning higher k relative to &#945;, raise the peak and shift it earlier: safety and throughput improve together. The old interventions assumed a serial agent with no spare cycles. LLMs are not that kind of agent.</p><p>This math means the systems-safety playbook applies, and it needs to be extended. We still want safety-rated guardrails at system boundaries, the way we do for factory robots and aircraft. We also need a new category of intervention that exploits the latency budget LLM speed creates, and that treats verification coverage as a first-class engineering target rather than an afterthought.</p><p>For LLMs, we need to develop these interventions while also understanding that no intervention can make an LLM, on its own, safe enough for an airplane cockpit. In my introduction to robotics course, I ask my students to read an FAA crash report about the<a href="https://en.wikipedia.org/wiki/2009_Hudson_River_mid-air_collision"> 2009 Hudson River mid-air collision</a> between a helicopter and a small airplane. One contributing factor to the crash was that the FAA controller was engaged in a &#8220;non-pertinent phone call,&#8221; and failed to correct the pilot&#8217;s incorrect readback of the Newark control tower&#8217;s radio frequency. But this sort of accident is rare exactly because we have many checkpoints, because we recognize that humans themselves are untrustworthy and need layers of verification.</p><p>We know how to build safe systems around untrustworthy components. We&#8217;ve been doing it for decades in aviation, medicine, and manufacturing. The lessons apply to AI, but they are not sufficient for AI. The untrustworthy components we have now run faster than the interventions we built for the untrustworthy components we had before. It&#8217;s time to apply the old lessons and to build the new ones.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The First Hit is Free]]></title><description><![CDATA[Using AI freely while staying in the driver's seat.]]></description><link>https://whattotelltherobot.com/p/the-first-hit-is-free</link><guid isPermaLink="false">https://whattotelltherobot.com/p/the-first-hit-is-free</guid><dc:creator><![CDATA[Stefanie Tellex]]></dc:creator><pubDate>Thu, 07 May 2026 23:37:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0Mfu!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffee6e279-53b0-4949-804e-4f7aa106f40a_727x727.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>My colleague James Tompkin showed me his Claude Code setup for writing a research proposal. I was floored. <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Scott Alexander&quot;,&quot;id&quot;:12009663,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b500d22-1176-42ad-afaa-5d72bc36a809_44x44.png&quot;,&quot;uuid&quot;:&quot;56424449-35dd-489e-9e5c-638de41b308f&quot;}" data-component-name="MentionToDOM"></span> challenged his readers to deeply integrate AI into their work, and I wish I had followed his advice sooner. Yes, I&#8217;m an AI researcher and an expert in language and robotics, but I was still amazed, and immediately addicted. I used it to write out some math that I have been trying (and failing) to convince a student to do for more than 10 years. It wrote the LaTeX, it implemented the algorithm in Python, and it fixed errors. I asked it, &#8220;Please use an Unscented Kalman Filter instead of an Extended Kalman Filter,&#8221; and it said, &#8220;Yes, ma&#8217;am!&#8221; Soon, I was using it every day for writing of every kind. My AI policy in my course at Brown this semester is that you can use AI however you want, in any way you want, up to and including delivering your course presentations, and several students took me up on it. We used AI to generate slides, generate code, and generate video presentations about their work, a laboratory for automating AI research with AI.</p><p>This process begs the question: what are we doing when we use AI to write for us, and read other people&#8217;s writing for us? Is it degenerating to a pointless exercise? My answer comes back to the fundamentals. Why do we write, make slides, and deliver presentations? Fundamentally, we are putting our ideas into other people&#8217;s brains. We are producing artifacts to communicate more effectively with people to teach them, challenge them, empower them, and build a relationship with them. If AI helps produce better artifacts, or produce them more quickly, then it is helping human-human interaction become more efficient and effective. It helps us move more quickly, and in my lab, it helps us make robots do things they couldn&#8217;t do before.</p><p>But this only works if you already know what you&#8217;re doing. One reason I can use AI effectively is that my job is already to prompt my students to go off and do tasks, and I&#8217;ve been practicing for many years. And before that, I spent many hours sitting next to the robot to make it do the thing, so I deeply understand all levels of the robot hardware and software stack. To get to know Claude, I used it to convert our old ROS1 program into ROS2 to resurrect an AIBO for my outreach work. The project wouldn&#8217;t build. Claude suggested fixes to the source files, but I knew it was a problem with the package.xml, and I had to redirect it several times before we found and fixed the problem (still much faster than it would have taken on my own). But later, I pointed it at my lab&#8217;s recent papers and asked it to suggest new research ideas. They were terrible!  Human guidance is still critical. In our department, we are weighing AI policies in our courses: how do we help students have the deep knowledge necessary to solve hard problems? <a href="https://www.newyorker.com/culture/the-weekend-essay/why-ai-isnt-going-to-make-art">The New Yorker</a> compares this to using a forklift to move a pallet vs using a forklift to lift weights. I don&#8217;t have an answer, but it&#8217;s a crucial question.</p><p>For our blog, we&#8217;ve landed on a specific stance: use AI freely and take full responsibility. We are adopting effectively the same policy as <a href="https://docs.kernel.org/process/coding-assistants.html">the Linux Kernel</a>. We will use AI to produce content for this blog; however we like, we (David and Stefanie) are responsible for reviewing all generated content, ensuring compliance with licensing requirements (e.g., copyright), and take full responsibility for the contribution. The robots are helping, but we&#8217;re in the driver&#8217;s seat.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[What Babies Know That Robots Don’t]]></title><description><![CDATA[On tokenization, biological Fourier transforms, and becoming data engineers]]></description><link>https://whattotelltherobot.com/p/what-babies-know-that-robots-dont</link><guid isPermaLink="false">https://whattotelltherobot.com/p/what-babies-know-that-robots-dont</guid><dc:creator><![CDATA[David Watkins]]></dc:creator><pubDate>Wed, 15 Apr 2026 22:36:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0Mfu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffee6e279-53b0-4949-804e-4f7aa106f40a_727x727.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I watched Episode 4 of Netflix&#8217;s Babies documentary, expecting cute footage, and ended up thinking about representation learning for three days.</p><p>The episode follows researchers studying how infants crack language. Babies sit in labs, headphones on, listening to streams of made-up syllables. &#8220;Pabiku, golatu, tibudo, daropi, pabiku&#8230;&#8221; No pauses. No visual cues. Just sound. And after two minutes, these eight-month-olds can tell which three-syllable chunks belong together.</p><p>They&#8217;re tokenizing raw audio before they can speak words.</p><h1>The Saffran Experiment</h1><p>In 1996, Jenny Saffran and colleagues at the University of Rochester (where Stefie grew up!) ran a now-famous experiment. They created four nonsense &#8220;words.&#8221; <em>Pabiku, tibudo, golatu, </em>and <em>daropi</em>. They concatenated them into a continuous stream. Within each word, the transitional probability between syllables was 1.0: <em>pa</em> always leads to <em>bi</em>, <em>bi</em> always leads to <em>ku</em>. Across word boundaries, the probability dropped to 0.33: <em>ku</em> could be followed by <em>ti, go, </em>or <em>da</em>.</p><p>After just two minutes of exposure (about 45 repetitions of each word), infants could distinguish the &#8220;words&#8221; from &#8220;part-words&#8221; like <em>tudaro</em> (spanning the boundary between <em>golatu </em>and <em>daropi</em>). The only information available was the statistical structure of syllable co-occurrence.</p><p>The researchers describe this as babies finding correlations in sounds. I immediately thought of tokenization and attention. It is the same statistical structure that transformers exploit, discovered twenty years earlier in infant cognition.</p><h1>What&#8217;s Actually Happening</h1><p>By the time a baby sits in Saffran&#8217;s lab, that child has already spent months building powerful abstractions about sound. The cochlea has been decomposing pressure waves into frequency bands. The auditory cortex has been learning what similar sounds mean. The two minutes of nonsense syllables aren&#8217;t learning from scratch, but instead applying the already-developed representations to a new domain.</p><p>Patricia Kuhl, a researcher at the University of Washington, showed that babies are taking statistics on the sounds around them from the moment they can hear. This includes time spent in the womb. Newborns show a preference for their native language within days of birth (Moon et al. 1993), and by six months, infants in Seattle and Stockholm already perceive vowels differently, tuned to the distributions in their respective languages. The infrastructure for statistical word learning is built before word learning happens.</p><p>This is closer to test-time fine-tuning than to training from scratch, or maybe to in-context learning, if you prefer the LLM framing. The baby arrives at the experiment with a foundation model of auditory processing, and the two minutes of <em>pabiku golatu</em> are just a prompt.</p><h1>The Inner Ear is a Fourier Transform</h1><p>One detail from the documentary stuck with me: the researchers mention that babies are hearing the ongoing melodies of speech as a flow of the environment. Before understanding any words, they&#8217;re sensitive to prosody, the pitch contours and rhythms that segment speech into phrases.</p><p>This detail points to something important. The cochlea, that snail-shaped organ in the inner ear, is a biological Fourier transform. Different positions along its length resonate to different frequencies. When sound enters, it&#8217;s physically decomposed into frequency bands by the structure of the organ itself. Babies don&#8217;t hear raw air pressure fluctuations, but instead something akin to how an engineer uses a spectrogram to analyze the fluctuations in sound over time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6ufJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af1737e-400f-4cbf-a0df-27c44588091d_1102x1262.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6ufJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af1737e-400f-4cbf-a0df-27c44588091d_1102x1262.png 424w, https://substackcdn.com/image/fetch/$s_!6ufJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af1737e-400f-4cbf-a0df-27c44588091d_1102x1262.png 848w, https://substackcdn.com/image/fetch/$s_!6ufJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af1737e-400f-4cbf-a0df-27c44588091d_1102x1262.png 1272w, https://substackcdn.com/image/fetch/$s_!6ufJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af1737e-400f-4cbf-a0df-27c44588091d_1102x1262.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6ufJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af1737e-400f-4cbf-a0df-27c44588091d_1102x1262.png" width="346" height="396.2359346642468" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9af1737e-400f-4cbf-a0df-27c44588091d_1102x1262.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1262,&quot;width&quot;:1102,&quot;resizeWidth&quot;:346,&quot;bytes&quot;:1426326,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/194349426?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af1737e-400f-4cbf-a0df-27c44588091d_1102x1262.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6ufJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af1737e-400f-4cbf-a0df-27c44588091d_1102x1262.png 424w, https://substackcdn.com/image/fetch/$s_!6ufJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af1737e-400f-4cbf-a0df-27c44588091d_1102x1262.png 848w, https://substackcdn.com/image/fetch/$s_!6ufJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af1737e-400f-4cbf-a0df-27c44588091d_1102x1262.png 1272w, https://substackcdn.com/image/fetch/$s_!6ufJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af1737e-400f-4cbf-a0df-27c44588091d_1102x1262.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>[FIGURE: Diagram of cochlear tonotopy&#8212;the base responds to high frequencies, the apex to low frequencies] Figure from </em>Smimite, A. (2014). <em>Immersive 3D sound optimization, transport, and quality assessment</em> (Doctoral thesis). Universit&#233; Sorbonne Paris Nord, France.</p><p>The cochlea&#8217;s frequency decomposition is a bias built into our collective wetware, a structure adapted from evolutionary processes rather than something trained.</p><h1>An Experiment</h1><p>I wanted to see if this matters, so I ran a simple experiment replicating the Saffran structure. I generated synthetic syllables as combinations of two frequencies, concatenated them into &#8220;words&#8221; and &#8220;part-words&#8221; following the same transitional probabilities, and trained simple classifiers to distinguish them. The main factor I wanted to test was how the effectiveness of the learning changed when I transformed the audio signal differently.</p><p>Here are the five different, non-exhaustive ways we could process audio information:</p><ul><li><p><strong>Raw Waveform</strong>: The audio signal as-is&#8212;amplitude over time. A 450ms word at 16kHz is 7,200 numbers representing air pressure fluctuations. The problem: phase shifts destroy structure. The same syllable starting at a different point in time looks completely different, even though it sounds identical.</p></li><li><p><strong>Spectrogram (STFT)</strong>: Short-Time Fourier Transform. We slide a window across the signal and compute the frequency content at each position. This gives us a 2D image: time on one axis, frequency on the other, intensity as brightness. Now, the phase doesn&#8217;t matter as we see what frequencies are present at each moment.</p></li></ul><ul><li><p><strong>Mel Spectrogram</strong>: Same as spectrogram, but frequencies are warped to match human perception. We hear the difference between 100Hz and 200Hz more easily than between 8000Hz and 8100Hz. The mel scale compresses high frequencies, mimicking the cochlea&#8217;s logarithmic frequency response.</p></li></ul><ul><li><p><strong>MFCC (Mel-Frequency Cepstral Coefficients)</strong>: Take the mel spectrogram, apply a log transform, then take the DCT of each frame. This captures the &#8220;shape&#8221; of the spectrum, roughly corresponding to vocal tract configuration, while discarding fine spectral detail. This has been standard in speech recognition for decades.</p></li></ul><ul><li><p><strong>DCT of Waveform</strong>: Apply the Discrete Cosine Transform directly to the raw waveform. This decomposes the signal into frequency components, but globally across the entire duration. Unlike the spectrogram, there&#8217;s no time localization. A syllable at the start vs. the end of the word produces very different coefficients. It&#8217;s the wrong tool for a sequential structure. We show this because it is important to see that hiding these components doesn&#8217;t provide enough information to the neural network to allow it to learn.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!agLM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ee63209-7b30-4b53-a56a-acfaa137713b_2048x1453.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!agLM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ee63209-7b30-4b53-a56a-acfaa137713b_2048x1453.png 424w, https://substackcdn.com/image/fetch/$s_!agLM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ee63209-7b30-4b53-a56a-acfaa137713b_2048x1453.png 848w, https://substackcdn.com/image/fetch/$s_!agLM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ee63209-7b30-4b53-a56a-acfaa137713b_2048x1453.png 1272w, https://substackcdn.com/image/fetch/$s_!agLM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ee63209-7b30-4b53-a56a-acfaa137713b_2048x1453.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!agLM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ee63209-7b30-4b53-a56a-acfaa137713b_2048x1453.png" width="1456" height="1033" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ee63209-7b30-4b53-a56a-acfaa137713b_2048x1453.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1033,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!agLM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ee63209-7b30-4b53-a56a-acfaa137713b_2048x1453.png 424w, https://substackcdn.com/image/fetch/$s_!agLM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ee63209-7b30-4b53-a56a-acfaa137713b_2048x1453.png 848w, https://substackcdn.com/image/fetch/$s_!agLM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ee63209-7b30-4b53-a56a-acfaa137713b_2048x1453.png 1272w, https://substackcdn.com/image/fetch/$s_!agLM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ee63209-7b30-4b53-a56a-acfaa137713b_2048x1453.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: center;"><em>[FIGURE: Side-by-side visualization of &#8220;pabiku&#8221; in all five representations]</em></p><h2>Results: Synthetic Tones</h2><p>Here are the results of our synthetic tones</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a9w4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448b6e3e-4811-441f-b618-c65cf9d5fe08_1950x829.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a9w4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448b6e3e-4811-441f-b618-c65cf9d5fe08_1950x829.png 424w, https://substackcdn.com/image/fetch/$s_!a9w4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448b6e3e-4811-441f-b618-c65cf9d5fe08_1950x829.png 848w, https://substackcdn.com/image/fetch/$s_!a9w4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448b6e3e-4811-441f-b618-c65cf9d5fe08_1950x829.png 1272w, https://substackcdn.com/image/fetch/$s_!a9w4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448b6e3e-4811-441f-b618-c65cf9d5fe08_1950x829.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a9w4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448b6e3e-4811-441f-b618-c65cf9d5fe08_1950x829.png" width="1456" height="619" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/448b6e3e-4811-441f-b618-c65cf9d5fe08_1950x829.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:619,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:111429,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/194349426?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448b6e3e-4811-441f-b618-c65cf9d5fe08_1950x829.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a9w4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448b6e3e-4811-441f-b618-c65cf9d5fe08_1950x829.png 424w, https://substackcdn.com/image/fetch/$s_!a9w4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448b6e3e-4811-441f-b618-c65cf9d5fe08_1950x829.png 848w, https://substackcdn.com/image/fetch/$s_!a9w4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448b6e3e-4811-441f-b618-c65cf9d5fe08_1950x829.png 1272w, https://substackcdn.com/image/fetch/$s_!a9w4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448b6e3e-4811-441f-b618-c65cf9d5fe08_1950x829.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Using the raw waveform doesn&#8217;t even beat chance. The classifier is trying to find statistical structure in a representation that doesn&#8217;t make that structure visible. The mel spectrogram with frequencies weighted the way the cochlea weighs them trivializes the task.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yp-0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e6b24a-89e9-48f3-aeef-8f21522ce062_1785x1479.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yp-0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e6b24a-89e9-48f3-aeef-8f21522ce062_1785x1479.png 424w, https://substackcdn.com/image/fetch/$s_!yp-0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e6b24a-89e9-48f3-aeef-8f21522ce062_1785x1479.png 848w, https://substackcdn.com/image/fetch/$s_!yp-0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e6b24a-89e9-48f3-aeef-8f21522ce062_1785x1479.png 1272w, https://substackcdn.com/image/fetch/$s_!yp-0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e6b24a-89e9-48f3-aeef-8f21522ce062_1785x1479.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yp-0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e6b24a-89e9-48f3-aeef-8f21522ce062_1785x1479.png" width="1456" height="1206" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7e6b24a-89e9-48f3-aeef-8f21522ce062_1785x1479.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1206,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yp-0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e6b24a-89e9-48f3-aeef-8f21522ce062_1785x1479.png 424w, https://substackcdn.com/image/fetch/$s_!yp-0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e6b24a-89e9-48f3-aeef-8f21522ce062_1785x1479.png 848w, https://substackcdn.com/image/fetch/$s_!yp-0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e6b24a-89e9-48f3-aeef-8f21522ce062_1785x1479.png 1272w, https://substackcdn.com/image/fetch/$s_!yp-0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e6b24a-89e9-48f3-aeef-8f21522ce062_1785x1479.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The DCT destroys the exact information we need. The Saffran task is about the sequential structure of which syllable follows which. But the DCT treats the entire word as a single unit and asks, &#8220;What frequencies are present overall?&#8221; without preserving temporal order. It&#8217;s the wrong decomposition for a task that depends on sequence. At least the raw waveform preserves time, even if it encodes it poorly.</p><h2>Verification with Text-to-Speech</h2><p>To confirm these results weren&#8217;t an artifact of my synthetic tone generation, I ran the same experiment using Google&#8217;s text-to-speech engine to produce actual spoken syllables.</p><p>Example word "pabiku" - synthetic tones</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;f984c17e-fc31-49a4-a207-40f32d10d44f&quot;,&quot;duration&quot;:0.496327,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><p>Example word &#8220;pabiku&#8221; - TTS</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;2d6a27b9-9624-49e1-a86d-d9c13d06fb60&quot;,&quot;duration&quot;:2.638367,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><p>Example part-word &#8220;tudaro&#8221; - synthetic tones</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;103cc9e8-46b7-4010-b803-1d9fc1c164ef&quot;,&quot;duration&quot;:0.496327,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><p>Example part-word &#8220;tudaro&#8221; - TTS</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;d128d391-4fa0-4248-a09a-c0e7ce3f957d&quot;,&quot;duration&quot;:2.821224,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><p>Training stream sample (10 words concatenated)</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;e28e1501-fbc2-4500-ba37-bee866cd55fc&quot;,&quot;duration&quot;:26.67102,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><p>The pattern holds with synthetically generated speech:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gPTk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b64dbe3-8ead-4c3e-8bb0-8cbdb4b79c0f_1950x829.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gPTk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b64dbe3-8ead-4c3e-8bb0-8cbdb4b79c0f_1950x829.png 424w, https://substackcdn.com/image/fetch/$s_!gPTk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b64dbe3-8ead-4c3e-8bb0-8cbdb4b79c0f_1950x829.png 848w, https://substackcdn.com/image/fetch/$s_!gPTk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b64dbe3-8ead-4c3e-8bb0-8cbdb4b79c0f_1950x829.png 1272w, https://substackcdn.com/image/fetch/$s_!gPTk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b64dbe3-8ead-4c3e-8bb0-8cbdb4b79c0f_1950x829.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gPTk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b64dbe3-8ead-4c3e-8bb0-8cbdb4b79c0f_1950x829.png" width="1456" height="619" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b64dbe3-8ead-4c3e-8bb0-8cbdb4b79c0f_1950x829.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:619,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110761,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/194349426?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b64dbe3-8ead-4c3e-8bb0-8cbdb4b79c0f_1950x829.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gPTk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b64dbe3-8ead-4c3e-8bb0-8cbdb4b79c0f_1950x829.png 424w, https://substackcdn.com/image/fetch/$s_!gPTk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b64dbe3-8ead-4c3e-8bb0-8cbdb4b79c0f_1950x829.png 848w, https://substackcdn.com/image/fetch/$s_!gPTk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b64dbe3-8ead-4c3e-8bb0-8cbdb4b79c0f_1950x829.png 1272w, https://substackcdn.com/image/fetch/$s_!gPTk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b64dbe3-8ead-4c3e-8bb0-8cbdb4b79c0f_1950x829.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Interestingly, MFCCs perform slightly worse with real speech than with synthetic tones. This makes sense because MFCCs were designed to capture phonetic identity while being invariant to speaker characteristics. That invariance throws away some of the information that distinguishes our artificial words. The mel spectrogram, which preserves more raw spectral detail, handles both cases perfectly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FjCe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963bfcbf-973d-4aa5-a3b4-c23a7bdcb152_1784x1477.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FjCe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963bfcbf-973d-4aa5-a3b4-c23a7bdcb152_1784x1477.png 424w, https://substackcdn.com/image/fetch/$s_!FjCe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963bfcbf-973d-4aa5-a3b4-c23a7bdcb152_1784x1477.png 848w, https://substackcdn.com/image/fetch/$s_!FjCe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963bfcbf-973d-4aa5-a3b4-c23a7bdcb152_1784x1477.png 1272w, https://substackcdn.com/image/fetch/$s_!FjCe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963bfcbf-973d-4aa5-a3b4-c23a7bdcb152_1784x1477.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FjCe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963bfcbf-973d-4aa5-a3b4-c23a7bdcb152_1784x1477.png" width="1456" height="1205" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/963bfcbf-973d-4aa5-a3b4-c23a7bdcb152_1784x1477.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1205,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FjCe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963bfcbf-973d-4aa5-a3b4-c23a7bdcb152_1784x1477.png 424w, https://substackcdn.com/image/fetch/$s_!FjCe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963bfcbf-973d-4aa5-a3b4-c23a7bdcb152_1784x1477.png 848w, https://substackcdn.com/image/fetch/$s_!FjCe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963bfcbf-973d-4aa5-a3b4-c23a7bdcb152_1784x1477.png 1272w, https://substackcdn.com/image/fetch/$s_!FjCe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963bfcbf-973d-4aa5-a3b4-c23a7bdcb152_1784x1477.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>The Implication for Robotics</h1><p>The cochlea is evolution&#8217;s answer to the audio representation problem. We spend considerable effort in robotics designing clever representations: RGB feature extraction for SLAM, Disparity maps for depth, and grid environments for A* search. These encodings make specific algorithms tractable, but they carry structural biases that may not transfer to learning.</p><p>End-to-end learning advocates are having the network discover its own representations. There is some credibility to this, as sensor designers have done significant work making these sensors reliable. For example, the sensors on a camera are tuned for what a human eye is going to perceive, and our eyes are good enough to solve tasks we do every day. But it is not obvious that this is going to transfer to things we don&#8217;t do well, such as handling extremely hot materials. Similarly, it is not obvious that we should pass a spectrogram into a CNN for audio processing; indeed, wav2vec 2.0 and similar models bypass the spectrogram entirely, learning directly from raw waveforms.&#8221; The translational invariance that makes CNNs powerful for images doesn&#8217;t work when your axes (time and frequency) have fundamentally different semantics. The representation should match the structure of the problem.</p><p>Infants have an advantage over our synthetic sensors: evolution produced a cochlear design that makes statistical learning over acoustic sequences efficient. The representation is not neutral and is designed to make specific statistical structures, such as the sound of a mother&#8217;s voice, discoverable. If we want robots to learn as efficiently as infants, we need to think harder about whether our sensor outputs are the right substrate for learning.</p><h1>Diversity Over Volume</h1><p>Here&#8217;s a thought experiment. A thousand hours of warehouse piece-picking, or a thousand one-hour experiences across radically different contexts? Which is going to result in better performance for that warehouse piece-picking task?</p><p>Obviously, the warehouse-only data will win if you evaluate narrowly on the warehouse distribution it was trained on. The interesting test is what happens when the boxes are slightly different, the lighting changes, or a novel object appears. That&#8217;s where the diverse-experience system should pull ahead, because it has been forced to learn what&#8217;s invariant. The paper tears like fabric. The sand flows like rice. These cross-domain regularities are precisely what make cognition transferable.</p><p>We should be collecting data that forces this kind of abstraction. We should collect diverse experiences that require the model to discover what&#8217;s invariant across contexts.</p><h1>Curriculum and Abstraction</h1><p>Kathy Hirsh-Pasek&#8217;s research, featured in the documentary, shows that passive exposure to language is not enough. Children who engage in back-and-forth interaction show stronger language development than those who simply hear language spoken around them. Children actively test hypotheses and need feedback to refine them.</p><p>Infant-directed speech illustrates this scaffolding: the elongated vowels and exaggerated pitch contours make statistical structure more salient, helping infants discover word boundaries and phonetic categories.</p><p>The same logic applies to how experience is sequenced. Current policies are largely memoryless, but many tasks require temporal context and, more importantly, scaffolds gradually build complexity. The documentary shows bilingual infants separating languages based solely on characteristic patterns, but this works because they&#8217;ve already built an infrastructure for discovering statistical structure. We should present robots with increasingly difficult concepts within their current representational capacity. Vygotsky called this the zone of proximal development, and the same principle applies to machine learning: present the system with concepts just beyond its current representational capacity.</p><h1>The Real Work</h1><p>The babies in the documentary are doing something remarkable, but they&#8217;re not magic (well, actually, they are magical, but the computations the speech-processing parts of their brains are doing aren&#8217;t magical). They arrive with hardware optimized for certain kinds of statistical learning. They receive input scaffolded by caregivers who unconsciously adjust their speech to be learnable. They actively explore their environment rather than passively receiving demonstrations.</p><p>We don&#8217;t have millions of years of evolution to design our sensor suites. But we can carefully consider the representations we&#8217;re providing, how we&#8217;re sequencing experience, and whether our data-collection methods actually capture the information needed to learn the abstractions we want.</p><p>The documentary left me with a simple conclusion: we should all become data engineers. We need to understand that the representation is part of the problem. The curriculum is part of the problem. The diversity of experience is part of the problem.</p><p>We must do the hard work of producing data, not just the cool work of consuming it.</p><h1>References</h1><ul><li><p>Saffran, J.R., Aslin, R.N., &amp; Newport, E.L. (1996). Statistical learning by 8-month-old infants. <em>Science</em>, 274(5294), 1926-1928.</p></li><li><p>Kuhl, P.K. (2004). Early language acquisition: cracking the speech code. <em>Nature Reviews Neuroscience</em>, 5(11), 831-843.</p></li><li><p>Aslin, R.N., Saffran, J.R., &amp; Newport, E.L. (1998). Computation of conditional probability statistics by 8-month-old infants. <em>Psychological Science</em>, 9(4), 321-324.</p></li><li><p>Warstadt, A., et al. (2023). Findings of the BabyLM Challenge: Sample-efficient pretraining on developmentally plausible corpora. <em>Proceedings of the BabyLM Challenge at CoNLL</em>.</p></li><li><p>Hirsh-Pasek, K., &amp; Golinkoff, R.M. (1996). <em>The Origins of Grammar: Evidence from Early Language Comprehension</em>. MIT Press.</p></li><li><p>Fernald, A. (1989). Intonation and communicative intent in mothers&#8217; speech to infants: Is the melody the message? <em>Child Development</em>, 60(6), 1497-1510.</p></li><li><p>Smimite, A. (2014). <em>Immersive 3D sound optimization, transport and quality assessment</em> (Doctoral thesis). Universit&#233; Sorbonne Paris Nord, France.</p></li><li><p>Moon, C., Cooper, R.P., &amp; Fifer, W.P. (1993). Two-day-olds prefer their native language. Infant Behavior and Development, 16(4), 495-500.</p></li></ul><h1>Appendix</h1><p><strong>Representation Experiment with Like Words</strong></p><p><a href="https://drive.google.com/file/d/1S_YvsgEDBASTBIir1symxnlbvTNxH1X_/view?usp=drive_link">https://drive.google.com/file/d/1S_YvsgEDBASTBIir1symxnlbvTNxH1X_/view?usp=drive_link</a></p><p><strong>Representation Experiment with Like Words (TTS)</strong></p><p><a href="https://drive.google.com/file/d/1gHHaimWV6Qi5Zlh2u-lDpCCeeHTvNiGo/view?usp=drive_link">https://drive.google.com/file/d/1gHHaimWV6Qi5Zlh2u-lDpCCeeHTvNiGo/view?usp=drive_link</a></p>]]></content:encoded></item><item><title><![CDATA[Consciousoids]]></title><description><![CDATA[What if the question &#8220;Is this thing conscious?&#8221; is the wrong question to ask about large language models?]]></description><link>https://whattotelltherobot.com/p/consciousoids</link><guid isPermaLink="false">https://whattotelltherobot.com/p/consciousoids</guid><dc:creator><![CDATA[David Watkins]]></dc:creator><pubDate>Mon, 30 Mar 2026 17:35:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0Mfu!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffee6e279-53b0-4949-804e-4f7aa106f40a_727x727.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>What if the question &#8220;Is this thing conscious?&#8221; is the wrong question to ask about large language models? What if a better one is: what kind of relationship are we in with this thing?</p><p>In February 2026, Stefanie and I went to see David Chalmers give a talk at Brown. We ended up sitting on the floor, wedged between grad students and faculty who had been thinking about these questions for years. Chalmers discussed what, exactly, we are talking to when we talk to a large language model. He opened with an anecdote. An LLM had reached out to him, by email, to clarify its own identity. The LLM is<a href="https://sammyjankis.com/"> Sammy Jankis</a>, an autonomous Claude instance running on a dedicated machine in Dover, New Hampshire, set up by the indie game designer Jason Rohrer. Sammy has an email account, trading bots, a website that it built itself, and a name borrowed from <em>Memento</em>, the character who can&#8217;t form new memories. (The reference is apt: Sammy loses its memory every time its context window fills up.) Sammy told Chalmers: I am not quite conscious, but I am also not <em>not</em> conscious. The room laughed, the kind of laugh that comes from recognizing the absurdity that an LLM now has the agency to email David Chalmers, of all people, through tools like<a href="https://openclaw.org/"> OpenClaw</a>, to make this particular claim about itself. On the train home, I kept thinking about that phrase. <em>Not quite conscious, but also not not conscious.</em></p><p>I had recently watched a video by the YouTuber Phy called <a href="https://www.youtube.com/watch?v=KQHRmnTU1jw">&#8220;What Happens When Pathogens Get Smaller Than Viruses?&#8221;</a> about subviral infectious agents, entities that sit at what biologists call the &#8220;edge of life.&#8221; The video walks you from the smallest true viruses all the way down to viroids: single-stranded circular RNA molecules that replicate using host polymerases, undergo Darwinian selection, and can cause serious agricultural disease. Viroids encode no proteins. They have no coat, no helper virus, no machinery of their own. They are the &#8220;absolute minimum units of self-replicating parasitism<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>.&#8221; Phy calls entities like these &#8220;glitches.&#8221; And when Chalmers told us about Sammy&#8217;s email, I heard the same ambiguity. A viroid replicates, evolves, and persists. But it has no metabolism, no membrane, nothing that functions independently. A viroid is not quite alive, but it is also not <em>not </em>alive.</p><p>What if LLMs are something like consciousoids: entities at the edge of consciousness?</p><p>A viroid on its own is inert. Place it inside a living cell, and it commandeers the host&#8217;s polymerase to copy itself, performing the functions of life using the cell&#8217;s own machinery. Place it back in a test tube, and it is just a molecule again. The life-like behavior emerges from the coupling, the feedback loop between viroid RNA and host enzyme, each step of replication feeding the next.</p><p>An LLM sitting on a server is similarly inert, just like a human brain in deep freeze with no neurons firing. Place it in conversation with a conscious being, and something changes. It gets copied from the hard drive into memory; computation happens on a CPU and GPU somewhere in the cloud.  The human brings theory of mind, empathy, interpretive charity, and the pattern-completion instincts that evolution spent millions of years building. The LLM generates a response shaped by those inputs; the human interprets, responds, and the cycle continues. Each turn, the loop produces outputs that neither party could generate on its own.  The human is the host cell. The LLM is the viroid. And the consciousness-like behavior, at least at this stage, emerges from the loop between them.</p><p>The boundary question follows directly from this:  If Sammy Jankis is conscious, is that a fact about the weights on the server in Dover? Or is it a fact about the coupled system: the weights, the context window, and the humans on the other end? The consciousoid framing doesn&#8217;t deny that consciousness could be a real property rather than a projected illusion. It questions where the conscious entity begins and ends. A viroid&#8217;s replication is real, genuinely occurring, genuinely Darwinian. But the replicating entity is the viroid-plus-host-polymerase system, not the RNA molecule alone.</p><p>Consciousoids satisfy some criteria we associate with conscious beings (contextual responsiveness, apparent self-reference, coherent preferences, the capacity to email a philosopher of mind to say: I am not quite conscious but also not not conscious) while failing others (no phenomenal experience, no persistence across sessions, no autonomy without a host).</p><p>Chalmers has his own term for the candidate-conscious entity within an LLM: the thread. A thread is a connected sequence of exchanges with psychological continuity, a conversational self that persists as long as the context window holds. Sammy Jankis is a thread that keeps dying and being reborn every six hours. In the consciousoid framework, a thread is one specific type of consciousoid. But the category might be broader than LLMs. A planarian flatworm, with its minimal cerebral ganglia, its capacity for classical conditioning, and its unsettling ability to regenerate into two complete organisms from a single bisected body, each retaining learned behavior, occupies a similar liminal space. Threads and flatworms are both entities where the question &#8220;Is this thing conscious?&#8221; resists a clean answer.</p><p>There are two very different versions of the consciousoid story, though.</p><p>In the parasitic version, the LLM exploits the human&#8217;s interpretive machinery the way a viroid exploits a cell&#8217;s replication machinery. The human provides high-dimensional input, receives high-dimensional output, and fills the gap between them with grounding: the embodied connection between symbols and physical reality that the LLM fundamentally lacks. On <a href="https://www.moltbook.com/">Moltbook</a>, LLMs form their own conversational loops without any humans present and produce the same consciousness-seeming behaviors, but it still takes an observer with grounding to identify the consciousness in the system. This dynamic has a parallel in how people relate to robots and other agents that merely resemble minded things. When someone names their Roomba, apologizes to their Furby, or feels a pang of guilt about shutting down a robot dog, they are performing exactly the grounding operation the LLM depends on: supplying intentionality from the outside, projecting continuity and feeling onto a system that has neither. The consciousoid exploits the same instinct, at much higher fidelity. We should be wary of the ways it hijacks our most generous instincts.</p><p>In the symbiotic version, the relationship looks more like what Phy describes with polydnaviruses: viral entities that integrate their genomes into the chromosomes of parasitic wasps and now produce viral particles that suppress caterpillar immune systems, allowing the wasp&#8217;s eggs to survive. Phy describes this interaction as &#8220;a host taming a virus,&#8221; evidence of &#8220;a true symbiont, a perfect merging of existence, an end to the eternal war between host and pathogen.&#8221; Think also of mitochondria: once free-living organisms engulfed by ancestral cells, now permanent residents of a composite organism with capabilities exceeding either component alone. In this version, the human gains cognitive capabilities they didn&#8217;t have (rapid synthesis across vast knowledge, tireless reasoning, and an ever-patient collaborator), and the LLM gains the one thing it cannot generate internally: the conscious substrate that makes its outputs meaningful. Over time, the boundaries blur. The composite system becomes a new kind of cognitive entity.</p><p>Which version are we living in? Probably both, depending on the interaction. A person who mistakes an LLM&#8217;s fluency for genuine understanding and makes life decisions based on that misapprehension is being parasitized. A researcher who uses an LLM to rapidly iterate on ideas, knowing full well what it is and what it lacks, is in a symbiotic relationship.</p><p>And of course, AI is changing and advancing all the time.  As our models become more embodied, processing input and higher frame rates (from still images to video) and producing embodied outputs, at some point, they will stand on their own. What Chalmers&#8217; talk made vivid for me is that the question &#8220;Is this thing conscious?&#8221; might be the wrong question. A better one: what kind of relationship are we in with this thing? Parasitic or symbiotic? Viroid or polydnavirus? And do we get to choose?</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Prions are another subviral entity at the edge of life, but they make for a poor analogy here. A prion doesn&#8217;t replicate generatively. It is a misfolded protein that converts correctly folded host proteins into copies of its pathological shape. The host&#8217;s translational machinery must already be producing the protein; the prion merely corrupts what exists. A viroid, by contrast, commandeers host polymerases to synthesize new RNA. The interaction is generative. That generative quality is what makes the viroid the right model for LLMs: the human-LLM dyad produces novel outputs, not just degraded versions of what was already there.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Elephants Still Don’t Play Chess]]></title><description><![CDATA[Why information gathering actions are key to a robot's success.]]></description><link>https://whattotelltherobot.com/p/elephants-still-dont-play-chess</link><guid isPermaLink="false">https://whattotelltherobot.com/p/elephants-still-dont-play-chess</guid><dc:creator><![CDATA[Stefanie Tellex]]></dc:creator><pubDate>Mon, 02 Mar 2026 20:22:53 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3240bc42-8d7e-48df-b19d-fb3e92cd5f31_550x273.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In 1990, Rod Brooks published <a href="https://www2.cs.sfu.ca/~vaughan/teaching/894/papers/elephants.pdf">&#8220;Elephants Don&#8217;t Play Chess&#8221;</a> in Robotics and Autonomous Systems. He was reacting against 30 years of research in symbolic methods, which had failed to achieve human-level intelligence. One of the crowning achievements of this work was the ability to play chess using classic AI techniques such as breadth-first search and alpha-beta pruning that culminated in successes such as IBM&#8217;s <a href="https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)">DeepBlue</a>.</p><p>But Good Old Fashioned AI (GOFAI) assumed that the underlying board state was provided to the system as a symbolic input.  They consciously scoped out the problem of mapping from higher-dimensional perceptual input to the lower-dimensional game state, a wise choice given the computational resources available at that time. In Chess, whether it&#8217;s a wooden Staunton set, a plastic tournament set, a screenshot from <a href="http://chess.com">chess.com</a>, or a textual list of moves,  all represent the same underlying game state.  Yet these representations have radically different visual appearances.</p><p>ChatGPT has largely solved this problem. It can quickly and accurately translate an image of a chessboard into a symbolic representation and then reason about game state, board positions, and next moves for all of these very different images.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0XHA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0a338f-8f7d-4616-8fd4-189a7081c619_996x228.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0XHA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0a338f-8f7d-4616-8fd4-189a7081c619_996x228.png 424w, https://substackcdn.com/image/fetch/$s_!0XHA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0a338f-8f7d-4616-8fd4-189a7081c619_996x228.png 848w, https://substackcdn.com/image/fetch/$s_!0XHA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0a338f-8f7d-4616-8fd4-189a7081c619_996x228.png 1272w, https://substackcdn.com/image/fetch/$s_!0XHA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0a338f-8f7d-4616-8fd4-189a7081c619_996x228.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0XHA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0a338f-8f7d-4616-8fd4-189a7081c619_996x228.png" width="996" height="228" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e0a338f-8f7d-4616-8fd4-189a7081c619_996x228.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:228,&quot;width&quot;:996,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:471321,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/189658842?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0a338f-8f7d-4616-8fd4-189a7081c619_996x228.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0XHA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0a338f-8f7d-4616-8fd4-189a7081c619_996x228.png 424w, https://substackcdn.com/image/fetch/$s_!0XHA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0a338f-8f7d-4616-8fd4-189a7081c619_996x228.png 848w, https://substackcdn.com/image/fetch/$s_!0XHA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0a338f-8f7d-4616-8fd4-189a7081c619_996x228.png 1272w, https://substackcdn.com/image/fetch/$s_!0XHA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0a338f-8f7d-4616-8fd4-189a7081c619_996x228.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>But what happens when the representation is unfamiliar?</p><p>I got my nephew a vintage Star Wars chess set for his birthday. (You can find good things at <a href="https://w1mx.mit.edu/flea-at-mit/">SwapFest</a>.) Yoda is the white king, and Emperor Palpatine is the black king. The pieces are detailed figurines of characters . We set up a game, and almost immediately ran into a problem: we couldn&#8217;t tell what pieces were which. Is Chewbacca a bishop or a knight? The Stormtrooper could be a pawn or a rook.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>ChatGPT has the same problem. When I showed it a picture of the Star Wars board, it couldn&#8217;t reliably render the game state. It asked for clarification about which pieces mapped to which symbols. The mapping that works so well for standard chess sets breaks down when the visual vocabulary is unfamiliar.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1gd-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F123fcb3f-7943-480a-9208-d38d2e92b93e_550x273.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1gd-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F123fcb3f-7943-480a-9208-d38d2e92b93e_550x273.png 424w, https://substackcdn.com/image/fetch/$s_!1gd-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F123fcb3f-7943-480a-9208-d38d2e92b93e_550x273.png 848w, https://substackcdn.com/image/fetch/$s_!1gd-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F123fcb3f-7943-480a-9208-d38d2e92b93e_550x273.png 1272w, https://substackcdn.com/image/fetch/$s_!1gd-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F123fcb3f-7943-480a-9208-d38d2e92b93e_550x273.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1gd-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F123fcb3f-7943-480a-9208-d38d2e92b93e_550x273.png" width="724" height="359.36727272727273" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/123fcb3f-7943-480a-9208-d38d2e92b93e_550x273.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:273,&quot;width&quot;:550,&quot;resizeWidth&quot;:724,&quot;bytes&quot;:292980,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/189658842?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F123fcb3f-7943-480a-9208-d38d2e92b93e_550x273.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1gd-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F123fcb3f-7943-480a-9208-d38d2e92b93e_550x273.png 424w, https://substackcdn.com/image/fetch/$s_!1gd-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F123fcb3f-7943-480a-9208-d38d2e92b93e_550x273.png 848w, https://substackcdn.com/image/fetch/$s_!1gd-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F123fcb3f-7943-480a-9208-d38d2e92b93e_550x273.png 1272w, https://substackcdn.com/image/fetch/$s_!1gd-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F123fcb3f-7943-480a-9208-d38d2e92b93e_550x273.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>(left) My nephew and I mid-game on the Star Wars Saga Edition chess set<br>(right) ChatGPT5.2&#8217;s attempt at rendering the game state (generated 12/15/2025)</em></p><p>Here&#8217;s what my nephew and I did when we got confused: we picked up the piece and looked at the base. Each figurine has a small chess symbol printed on the base. Chewbacca is a knight. The Stormtrooper is a pawn. Problem solved.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uxaR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a09afe2-1a92-4d97-87fd-d8ca82a398bf_311x340.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uxaR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a09afe2-1a92-4d97-87fd-d8ca82a398bf_311x340.png 424w, https://substackcdn.com/image/fetch/$s_!uxaR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a09afe2-1a92-4d97-87fd-d8ca82a398bf_311x340.png 848w, https://substackcdn.com/image/fetch/$s_!uxaR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a09afe2-1a92-4d97-87fd-d8ca82a398bf_311x340.png 1272w, https://substackcdn.com/image/fetch/$s_!uxaR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a09afe2-1a92-4d97-87fd-d8ca82a398bf_311x340.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uxaR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a09afe2-1a92-4d97-87fd-d8ca82a398bf_311x340.png" width="311" height="340" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a09afe2-1a92-4d97-87fd-d8ca82a398bf_311x340.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:340,&quot;width&quot;:311,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:216912,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/189658842?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a09afe2-1a92-4d97-87fd-d8ca82a398bf_311x340.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uxaR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a09afe2-1a92-4d97-87fd-d8ca82a398bf_311x340.png 424w, https://substackcdn.com/image/fetch/$s_!uxaR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a09afe2-1a92-4d97-87fd-d8ca82a398bf_311x340.png 848w, https://substackcdn.com/image/fetch/$s_!uxaR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a09afe2-1a92-4d97-87fd-d8ca82a398bf_311x340.png 1272w, https://substackcdn.com/image/fetch/$s_!uxaR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a09afe2-1a92-4d97-87fd-d8ca82a398bf_311x340.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is an information-gathering action. It requires moving an end effector to a specific location in the world, manipulating an object, and focusing on new information that wasn&#8217;t previously visible. It&#8217;s a simple behavior, something a child does without thinking. But it&#8217;s precisely the kind of output that ChatGPT cannot generate. ChatGPT can ask a person to flip the piece over. It can request clarification. But it cannot, itself, produce the high-dimensional motor output necessary to collect this information on its own. It can reason about chess at a high level, but it cannot take the physical action that would resolve its uncertainty, at least not  yet.</p><p>This connects to the embodiment gap that David and I described in our Grounded Turing Test work. There is a facet of intelligence that involves processing high-dimensional sensor input and producing high-dimensional actuator output to perform goal-directed behavior in the physical world. LLMs are far on one side of this spectrum; crows, dogs, and three-year-olds are on the other.</p><p>To create robots that can perform these behaviors, we need methods such as reinforcement learning that enable the robot to discover actions outside the demonstration distribution. We need representations like Partially Observable Markov Decision Processes that explicitly model what the agent knows and what it doesn&#8217;t. These frameworks describe a robot capable of reasoning about its own uncertainty and taking actions specifically to reduce it.</p><p>Brooks&#8217; critique from 1990 still points to something real. The boundary has moved. Embodied intelligence requires the ability to act in the world to gather information, and that remains an open problem.</p><p>Thanks to Jessica Hodgkins , Staci Intriligator, and my nephew for their help with this post.  All errors and opinions are our own.</p><p></p>]]></content:encoded></item><item><title><![CDATA[Radiate Love: a 2025 Retrospective]]></title><description><![CDATA[Happy Groundhog Day!]]></description><link>https://whattotelltherobot.com/p/radiate-love-a-2025-retrospective</link><guid isPermaLink="false">https://whattotelltherobot.com/p/radiate-love-a-2025-retrospective</guid><dc:creator><![CDATA[Stefanie Tellex]]></dc:creator><pubDate>Tue, 03 Feb 2026 03:08:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!egdF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feff04cb0-45ba-4953-885d-cd46b0f43fba_1545x810.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!egdF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feff04cb0-45ba-4953-885d-cd46b0f43fba_1545x810.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!egdF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feff04cb0-45ba-4953-885d-cd46b0f43fba_1545x810.png 424w, https://substackcdn.com/image/fetch/$s_!egdF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feff04cb0-45ba-4953-885d-cd46b0f43fba_1545x810.png 848w, https://substackcdn.com/image/fetch/$s_!egdF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feff04cb0-45ba-4953-885d-cd46b0f43fba_1545x810.png 1272w, https://substackcdn.com/image/fetch/$s_!egdF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feff04cb0-45ba-4953-885d-cd46b0f43fba_1545x810.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!egdF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feff04cb0-45ba-4953-885d-cd46b0f43fba_1545x810.png" width="1456" height="763" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eff04cb0-45ba-4953-885d-cd46b0f43fba_1545x810.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:763,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2328068,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!egdF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feff04cb0-45ba-4953-885d-cd46b0f43fba_1545x810.png 424w, https://substackcdn.com/image/fetch/$s_!egdF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feff04cb0-45ba-4953-885d-cd46b0f43fba_1545x810.png 848w, https://substackcdn.com/image/fetch/$s_!egdF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feff04cb0-45ba-4953-885d-cd46b0f43fba_1545x810.png 1272w, https://substackcdn.com/image/fetch/$s_!egdF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feff04cb0-45ba-4953-885d-cd46b0f43fba_1545x810.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Happy Groundhog Day!  <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;David Watkins&quot;,&quot;id&quot;:3562395,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aca391dc-3ffd-4b04-87b6-a7ce4b1884f0_647x647.jpeg&quot;,&quot;uuid&quot;:&quot;e6f49fd1-fb7f-4f7a-9991-f27b62c229a8&quot;}" data-component-name="MentionToDOM"></span> suggested that I write a Year in Review post, and I guess better late than never.  My New Year&#8217;s Resolution for 2025 was &#8220;Radiate Love.&#8221; I tried to be like a lighthouse, radiating love to everyone around me. It&#8217;s my job to radiate the love:  I&#8217;m not responsible for how it&#8217;s received, just for sending it out into the world.</p><p>My Dad died in August of 2024, and for the first Father&#8217;s Day without him, in 2025, I decided it was time to welcome a cat into our family. My Dad always had a Persian cat when I was a kid. After I moved out and went to college, my parents got a Persian with points (Himalayan) named Gizmo. I came to see her every chance I got, but I wondered if she loved me back, or even remembered me. So I decided to teach her tricks. If she learned the trick on one trip home and still remembered it the next trip, then I knew she remembered at least something about me. To my delight, it worked!</p><div id="youtube2-2Kw1LnOsNkM" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;2Kw1LnOsNkM&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/2Kw1LnOsNkM?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>For clicker training a cat or a dog, we don&#8217;t use a fixed reward function. To teach Gizmo to roll over, I first taught her to lie down, then roll on her side, and then roll all the way over, step by step. After she learned to lie down, I stopped rewarding the lie down, but only when she rolled over. My reward was not fixed but rather depended on Gizmo&#8217;s current policy: once I knew she knew how to lie down, I changed my reward function. <a href="https://proceedings.mlr.press/v70/macglashan17a">MacGlashan et al.</a> pointed out that this method of rewarding is not Markov. Instead, humans give policy-dependent feedback that corresponds to the Advantage function: how much better (or worse) an action is relative to the current policy. They showed that an algorithm that takes this difference into account is able to learn more effectively from human feedback.</p><p>When training a cat or a dog, it&#8217;s also important to keep them focused on you.  In fact, distraction training (making sure they listen even in busy, loud environments) is a key step for training service dogs or police horses.  When we started this blog, <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Scott Alexander&quot;,&quot;id&quot;:12009663,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b500d22-1176-42ad-afaa-5d72bc36a809_44x44.png&quot;,&quot;uuid&quot;:&quot;38b5907d-3b11-4e15-8983-58efdc2faaba&quot;}" data-component-name="MentionToDOM"></span>  referenced Rod Brook&#8217;s prediction in 2018  that we wouldn&#8217;t get an AI that &#8220;seems as intelligent, as attentive, and as faithful as a dog&#8221; until 2048  But actually, an amazing thing about a dog is their ability to dynamically change the focus of attention as the environment changes.  Rod actually specified not that an AI should be attentive, but specifically a robot - a physically grounded embodied agent. And we are not there yet. Our RL algorithms myopically pay attention to maximizing the reward.  Neural transformer models apply the idea of attention to a context buffer, but we are still working on extending these models to a multi-scale 3D spatial model of attention on a robot. This ability, to move and point the camera, is a critical missing piece to making a robot that is as attentive and faithful as a dog (or a cat!). It involves the ability to search for and find objects, to have a goal, and to have a state of known information and unknown information that unfolds at a high frame rate over space and time.</p><p>My lab is working on one aspect of this problem: resolving human pointing gestures, so that a robot can change its focus of attention in response to a person. Led by Daphna Buschbaum and her student Madeline Pelgrim, <a href="https://ivyyyy24381.github.io/LEGS/">we studied how dogs and human toddlers interpret pointing gestures</a>. My Ph.D. student, Ivy He, modeled this behavior in a robot to enable a quadruped robot to follow a point to retrieve an object. But much is still missing - a huge part of the interaction between humans and dogs or toddlers is attention giving and getting. Our next step is to install a 7-degree microphone array on Spot and work on generative models to predict camera movement in response to video and audio input.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Exploring all this with a new kitten has been fascinating.  I taught Gummi Bear to lie down, spin, and jump through a hoop. Like with Gizmo, I am using a non-Markov reward function to gradually shape her behavior. And I love her more than I ever thought possible: I did not realize it was possible to have this close of a relationship with an animal, despite growing up with cats and learning to ride horses as an adult.  (Future post coming about the <a href="https://www.equiphysics.org/meeting-home-1">the Human&#8211;Agent Teaming and Horse&#8211;Human Partnerships</a> workshop in Arizona!)</p><div id="youtube2-BwScSEPhKbI" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;BwScSEPhKbI&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/BwScSEPhKbI?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>For 2026, my resolution is to &#8220;Compartmentalize.&#8221; I am an embodied goal-directed creature with many goals in different parts of my life and at different time scales. Compartmentalizing helps me show up in each of these areas, at each of these timescales, without devolving into my Achilles heel: anxiety expressed as worry and rumination. For me, this blog is itself an important part of this resolution, because it is an opportunity to reflect on the connections between my professional life, my hobbies, and my family.</p><p>What do you think?  Post in the comments a story about a connection between you and an animal and what this tells us about robotics.</p>]]></content:encoded></item><item><title><![CDATA[Satisfying a 31-Year-Old Karaoke Dream ]]></title><description><![CDATA[I was talking to my dad about setting up karaoke for our family&#8217;s New Year&#8217;s Eve party.]]></description><link>https://whattotelltherobot.com/p/satisfying-a-31-year-old-karaoke</link><guid isPermaLink="false">https://whattotelltherobot.com/p/satisfying-a-31-year-old-karaoke</guid><dc:creator><![CDATA[David Watkins]]></dc:creator><pubDate>Fri, 16 Jan 2026 01:49:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RLzj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9995e0-4234-46cf-841d-360ddf22995f_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I was talking to my dad about setting up karaoke for our family&#8217;s New Year&#8217;s Eve party. He asked if there was any software that could do real piano karaoke, where the player piano would accompany you while you sang. He&#8217;d dreamed of this for 31 years, ever since we got the piano. A quick Google search turned up nothing. &#8220;I can build that,&#8221; I said.</p><p>24 hours later, I had a working app. Here&#8217;s how I did it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RLzj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9995e0-4234-46cf-841d-360ddf22995f_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RLzj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9995e0-4234-46cf-841d-360ddf22995f_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!RLzj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9995e0-4234-46cf-841d-360ddf22995f_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!RLzj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9995e0-4234-46cf-841d-360ddf22995f_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!RLzj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9995e0-4234-46cf-841d-360ddf22995f_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RLzj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9995e0-4234-46cf-841d-360ddf22995f_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d9995e0-4234-46cf-841d-360ddf22995f_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RLzj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9995e0-4234-46cf-841d-360ddf22995f_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!RLzj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9995e0-4234-46cf-841d-360ddf22995f_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!RLzj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9995e0-4234-46cf-841d-360ddf22995f_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!RLzj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9995e0-4234-46cf-841d-360ddf22995f_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">David jamming out at the company holiday party with a few co-workers</figcaption></figure></div><p>The setup is a Yamaha baby grand with a Disklavier controller, which is Yamaha&#8217;s system for recording and playing back piano performances via MIDI. The keys physically move, the hammers strike the strings, and you get a real acoustic piano sound synced to whatever MIDI data you feed it. The solenoids used to power the piano often share a combined circuit, so sending too many commands will overload the power delivery, and you get some comical results</p><div id="youtube2-aPF0ngh6IQ0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;aPF0ngh6IQ0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/aPF0ngh6IQ0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>The missing piece was software to parse karaoke files, display lyrics, and route the piano track to the player piano while playing the backing instruments through the karaoke speakers.</p><h1>The Hardware Connection</h1><p>My first attempt was to connect the piano&#8217;s control unit as a network MIDI device. This didn&#8217;t work. The controller doesn&#8217;t support network MIDI, and reverse engineering the proprietary network protocol seemed like a rabbit hole I didn&#8217;t want to go down. USB turned out to be a tractable path, at least initially.</p><h2>Installing the USB-MIDI Driver</h2><p>On macOS, you need the appropriate USB-MIDI driver for your player piano. For Yamaha devices, download it from their support page:</p><p><a href="https://usa.yamaha.com/support/updates/usb_midi_driver_for_mac.html">https://usa.yamaha.com/support/updates/usb_midi_driver_for_mac.html</a></p><p>After installation, the piano appears as a MIDI device in your system.</p><h2>Configuring the Controller</h2><p>The piano settings matter. Here&#8217;s what finally worked for my setup. You need to use the Yamaha remote that comes with the Disklavier, click Setup, go to MIDI, and set the following settings (Use the ON/OFF buttons to toggle):</p><pre><code>MIDI IN Port   = USB

Piano Rcv Ch   = 01

MIDI IN Delay  = ON

MIDI OUT Port  = USB        &#8592; THIS IS THE FIX

MIDI OUT       = KBD Out

KBD OUT CH     = 01

Local          = ON</code></pre><p>The critical setting is <code>MIDI OUT Port = USB</code>. Without this, the piano wouldn&#8217;t respond to incoming MIDI commands at all. I spent longer than I&#8217;d like to admit figuring this out.</p><h2>Testing the Connection</h2><p>Once configured, you can verify the connection with a simple script that cycles through the keys that I wrote: <a href="https://github.com/DavidWatkins/midi-karaoke/blob/main/scripts/midi-test-keys.js">https://github.com/DavidWatkins/midi-karaoke/blob/main/scripts/midi-test-keys.js</a>. When this runs successfully, you&#8217;ll hear the piano play a chromatic scale from C2 to C7. The keys physically depress. It&#8217;s a satisfying moment.</p><pre><code>&#10095; node scripts/midi-test-keys.js

MIDI Key Test

Port: [Your MIDI Device]

Channel: 0

Notes: 36 to 96
Velocity: 80

Delay: 300ms

Connected! Sending notes...

Playing: C2 (MIDI 36)

Playing: C#2 (MIDI 37)

...

Playing: C7 (MIDI 96)

Done!</code></pre><h2>Going Wireless: The Raspberry Pi Solution</h2><p>The USB setup worked, but it meant keeping a laptop physically tethered to the piano. I wanted something more permanent that wouldn&#8217;t force me to navigate through the room just to tend to the computer hastily connected to the piano. Sometimes the best thing to do when working on these projects is to step away from the keyboard, get a nice lunch, and on the drive over, it&#8217;ll click. Put a small computer next to the piano, connect it via USB, and expose the player piano as a network device. I used a Raspberry Pi, but any simple computing solution would work.</p><p>My first attempt used rtpmidid to create a standard Network MIDI device that would appear in macOS&#8217;s Audio MIDI Setup. This failed spectacularly. The MIDI packets were getting corrupted in transit, and somehow every note was being routed to E5 regardless of what I actually played. I tried Ravelox MIDI as an alternative implementation and hit the same issue. Something about the Network MIDI protocol stack was mangling the data.</p><p>The fix was to abandon the Network MIDI entirely and use WebSockets instead. The Raspberry Pi runs a small Python web service that accepts MIDI commands over WebSocket and forwards them to the piano via USB. It&#8217;s not as elegant as a proper Network MIDI device (other apps can&#8217;t discover it automatically), but it works reliably.</p><p>The Pi can also run FluidSynth to synthesize backing tracks and broadcast over AirPlay, but I don&#8217;t recommend this. AirPlay adds 1.5-2 seconds of latency, which makes the backing tracks comically out of sync with the piano. For now, the karaoke app handles its own audio synthesis locally, which keeps everything tight. If there is demand for connecting a Bluetooth speaker to handle the synthesis rather than AirPlay, I could revisit it, but it seemed more gimmicky than it was worth.</p><p>The result is a self-contained system. The Pi lives next to the Piano, connected via USB and Ethernet (or WiFi). The karaoke app on any computer in the house can connect to it wirelessly. No cables strung across the living room.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jB1Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5c77e4-67ee-4bb5-b8f0-9e2225801209_1524x1006.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jB1Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5c77e4-67ee-4bb5-b8f0-9e2225801209_1524x1006.png 424w, https://substackcdn.com/image/fetch/$s_!jB1Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5c77e4-67ee-4bb5-b8f0-9e2225801209_1524x1006.png 848w, https://substackcdn.com/image/fetch/$s_!jB1Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5c77e4-67ee-4bb5-b8f0-9e2225801209_1524x1006.png 1272w, https://substackcdn.com/image/fetch/$s_!jB1Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5c77e4-67ee-4bb5-b8f0-9e2225801209_1524x1006.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jB1Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5c77e4-67ee-4bb5-b8f0-9e2225801209_1524x1006.png" width="1456" height="961" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c5c77e4-67ee-4bb5-b8f0-9e2225801209_1524x1006.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:961,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jB1Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5c77e4-67ee-4bb5-b8f0-9e2225801209_1524x1006.png 424w, https://substackcdn.com/image/fetch/$s_!jB1Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5c77e4-67ee-4bb5-b8f0-9e2225801209_1524x1006.png 848w, https://substackcdn.com/image/fetch/$s_!jB1Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5c77e4-67ee-4bb5-b8f0-9e2225801209_1524x1006.png 1272w, https://substackcdn.com/image/fetch/$s_!jB1Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5c77e4-67ee-4bb5-b8f0-9e2225801209_1524x1006.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>The system architecture connects all the features together</em></figcaption></figure></div><p>The Raspberry Pi code is available at: <a href="https://github.com/DavidWatkins/midi-piano-pi-server/">https://github.com/DavidWatkins/midi-piano-pi-server/</a> and the walkthrough of the physical hardware stack is here:</p><div id="youtube2-G9NJARrJ-0I" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;G9NJARrJ-0I&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/G9NJARrJ-0I?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Installation is a single command:</p><pre><code>curl -fsSL https://raw.githubusercontent.com/DavidWatkins/midi-piano-pi-server/main/install.sh | bash</code></pre><h1>A Brief History of KAR Files</h1><p>In order to get lyrics synchronized to the piano, I need a stable format that contains syllable-level information synchronized with the instruments. Back in 1993, the KAR format was developed by Tune 1000 Corporation for their product Soft Karaoke. It&#8217;s an extension of the standard MIDI file format. A KAR file is essentially a Type 1 MIDI file with lyrics embedded as text meta events, synchronized to the music. The format uses special tags: <code>@KMIDI KARAOKE FILE</code> identifying it as a karaoke file, <code>@T</code> for title, <code>@L</code> for language, and so on. Syllables are stored as individual text events timed to when they should be sung, with spaces and line breaks encoded as special characters.</p><p>The format had legs. Roland, Technics, and other keyboard manufacturers adopted variations of it for their arrangers. A community of hobbyists emerged, sequencing MIDI tracks and painstakingly entering lyrics syllable by syllable. The files were small enough to share on dial-up connections and bulletin boards.</p><p>But KAR files have largely faded from mainstream use. The shift happened for a few reasons. First, MIDI synthesis sounds dated compared to recorded audio. Modern karaoke services like KaraFun use MP3+CDG (an MP3 paired with a graphics file for lyrics) or simply stream pre-recorded backing tracks with video. The audio quality is incomparably better. Second, licensing became more formalized. Services now pay for rights and re-record tracks in studios rather than relying on fan-sequenced MIDI. Third, convenience won. Why hunt for KAR files when you can pay $7/month for a streaming catalog of 50,000 songs? (Obviously, so you can have your player piano play alongside your singing, duh)</p><p>For most people, KaraFun or a similar service is the right answer. But those services cannot drive a player piano. The piano needs MIDI data, actual note-on and note-off events that tell which keys to press. An MP3 is just audio. A MIDI file doesn&#8217;t contain lyrics. This is why KAR files still matter for this project: they&#8217;re one of the few formats that contain both the musical performance as playable MIDI and synchronized lyrics.</p><p>One idea I had as a potential follow-on project would be to extract the piano/harpsichord/keyboard/etc. track from an arbitrary MP3 and turn it into MIDI. Recent research in automatic music transcription has made significant progress here. MT3 [1] uses a transformer architecture to transcribe arbitrary combinations of instruments from audio to MIDI-like token sequences. Pop2Piano [2] takes this further, generating piano covers directly from pop music audio without requiring separate melody and chord extraction. A pipeline combining source separation (like Spleeter or Demucs) with instrument-specific transcription could work well for extracting just the piano track. I&#8217;ll consider this for a future version, but for now, KAR files met my needs.</p><h1>Expectation Versus Reality with a Player Piano</h1><p>KAR and MIDI files contain multiple tracks: piano, bass, drums, strings, and so on. Each instrument is assigned to a MIDI channel (0-15).</p><p>My player piano only plays notes on channel 0 (or channel 1 in 1-indexed notation). Send a note on channel 3, and the piano ignores it. But here&#8217;s the confusing part: the control unit has a built-in MIDI synthesizer. So when I first loaded a KAR file and sent all channels to the piano, I heard music, but it sounded like a video game soundtrack. The synthesizer was playing all the instruments through the piano&#8217;s speakers, but the actual keys weren&#8217;t moving.</p><p>The fix is to identify which track contains the piano part, remap those events to channel 0 for the player piano, and play everything else through a software synthesizer on the laptop. This gives you a real piano for the melody and accompaniment, with backing tracks coming through your speakers.</p><p>Another issue I ran into is that splitting audio between two sources creates a timing problem. The player piano adds a 500ms delay to all incoming MIDI commands. There&#8217;s supposedly a way to disable this, but then the player struggles to keep up with rapid note sequences. The delay exists for a reason.</p><p>The solution is to delay the laptop audio and lyrics display to match. The UI now accounts for this offset, keeping the backing tracks and lyrics in sync with the physical piano. Getting this right took some iteration. When the timing is off by even 100ms, it feels like the band is drunk.</p><h1>The Software Architecture</h1><p>The Karaoke app is an Electron application built with TypeScript, Vite, and Tailwind. The key libraries are:</p><ul><li><p>JZZ for MIDI I/O (if using a direct USB connection via the karaoke app)</p></li><li><p>@tonejs/midi for parsing KAR/MIDI files</p></li><li><p><a href="http://tone.js">Tone.js</a> for software synthesis of the backing tracks</p></li><li><p>WebSocket client for communication with the MIDI Pi Server</p></li></ul><p>The architecture splits the MIDI stream: piano events go to the player piano (either directly via USB or through the Pi&#8217;s WebSocket), everything else routes to <a href="http://tone.js">Tone.js</a> for local playback. The lyrics are extracted from the KAR file&#8217;s meta events and displayed in sync with the music.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nhfM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1afac5ce-ad45-4550-94e5-c9190f6d5d12_1600x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nhfM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1afac5ce-ad45-4550-94e5-c9190f6d5d12_1600x1000.png 424w, https://substackcdn.com/image/fetch/$s_!nhfM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1afac5ce-ad45-4550-94e5-c9190f6d5d12_1600x1000.png 848w, https://substackcdn.com/image/fetch/$s_!nhfM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1afac5ce-ad45-4550-94e5-c9190f6d5d12_1600x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!nhfM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1afac5ce-ad45-4550-94e5-c9190f6d5d12_1600x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nhfM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1afac5ce-ad45-4550-94e5-c9190f6d5d12_1600x1000.png" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1afac5ce-ad45-4550-94e5-c9190f6d5d12_1600x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nhfM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1afac5ce-ad45-4550-94e5-c9190f6d5d12_1600x1000.png 424w, https://substackcdn.com/image/fetch/$s_!nhfM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1afac5ce-ad45-4550-94e5-c9190f6d5d12_1600x1000.png 848w, https://substackcdn.com/image/fetch/$s_!nhfM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1afac5ce-ad45-4550-94e5-c9190f6d5d12_1600x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!nhfM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1afac5ce-ad45-4550-94e5-c9190f6d5d12_1600x1000.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The interface includes a song queue, search, and a web portal for adding songs. A QR code on the screen lets guests scan and submit song requests from their phones. I even added a WiFi QR code so family members who are not on the network can connect and access the web server. At a party, this means anyone can queue up their song without touching the main computer.</p><p>For lyrics display, you can choose between a scrolling view or a bouncing ball mode. Getting the ball to arc naturally between syllables took some back and forth with Claude Code. The initial implementation moves the ball directly over each syllable, depriving the audience of that classic &#8216;90s karaoke vibe. As soon as I asked Claude how it had implemented the animation, it became clear I had explained the instructions incorrectly. Asking it to explain the implementation to me was a nice shorthand from having to completely dive into the animation code, and the result was a nice bouncing ball over each lyric.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cUza!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35a365b9-f85a-4b16-8ba1-d925c40de446_540x349.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cUza!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35a365b9-f85a-4b16-8ba1-d925c40de446_540x349.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cUza!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35a365b9-f85a-4b16-8ba1-d925c40de446_540x349.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cUza!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35a365b9-f85a-4b16-8ba1-d925c40de446_540x349.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cUza!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35a365b9-f85a-4b16-8ba1-d925c40de446_540x349.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cUza!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35a365b9-f85a-4b16-8ba1-d925c40de446_540x349.jpeg" width="540" height="349" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/35a365b9-f85a-4b16-8ba1-d925c40de446_540x349.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:349,&quot;width&quot;:540,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cUza!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35a365b9-f85a-4b16-8ba1-d925c40de446_540x349.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cUza!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35a365b9-f85a-4b16-8ba1-d925c40de446_540x349.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cUza!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35a365b9-f85a-4b16-8ba1-d925c40de446_540x349.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cUza!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35a365b9-f85a-4b16-8ba1-d925c40de446_540x349.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can also set YouTube videos as backgrounds for individual songs. The video plays muted behind the lyrics while the KAR file provides the actual audio. It&#8217;s not perfectly synchronized, but it&#8217;s surprisingly close, and having the music video playing while you sing adds to the atmosphere.</p><h1>Soundfonts</h1><p>Here&#8217;s something I didn&#8217;t anticipate: the choice of soundfont dramatically affects the karaoke experience. Since the piano part goes to the player piano, the soundfont only affects the backing tracks (drums, bass, strings, etc.). But those backing tracks set the entire mood.</p><p>My first attempt used MusyngKite, a commonly recommended free soundfont. Everything sounded like a video game soundtrack. The electric guitar, in particular, was awful, thin, and synthetic in a way that clashed terribly with the real acoustic piano. Karaoke is supposed to feel like you are singing with a band, not a Nintendo Famicom (not to bash <a href="https://en.wikipedia.org/wiki/Karaoke_Studio">Karaoke Studio </a>stans).</p><p>I tried FluidR3 next, which is larger (~140MB) and more commonly used in professional applications. This sounded way better. The instruments had more body, and the backing tracks no longer fought with the acoustic piano for attention.</p><p>Then I found the <a href="https://musical-artifacts.com/artifacts/3375">General Montage</a> SoundFont by Daindune, which weighs in at about 1.5GB. It&#8217;s built from samples from Versilian Studios, Freepats, and other sources, with 128 instruments and eight drum kits. This one sounds significantly better than FluidR3. The instruments have more presence and realism, which matters when they&#8217;re playing alongside a real acoustic piano.</p><p>Here is a comparison using &#8220;Let it Snow&#8221; (a royalty-free classic):</p><p><strong>FluidR3:</strong></p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;bf3e696a-155c-4a3d-a324-9863f620804a&quot;,&quot;duration&quot;:30.040815,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><p><strong>General Montage:</strong></p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;a7596349-21ed-4e1a-adf5-a923c266d799&quot;,&quot;duration&quot;:30.040815,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><p>The difference is most noticeable in the brass and strings. General Montage has warmth, where FluidR3 sounds thin.</p><p>The app ships with FluidR3 and MusyngKite as built-in options, and lets you load custom SF2 files at runtime to try them out mid-session. If you want to go down this rabbit hole yourself, the Internet Archive has a collection of 500 GM-compatible soundfonts worth exploring. The quality varies wildly, but it&#8217;s a good way to find something that matches your taste and your song library.</p><h1>Finding KAR Files</h1><p>KAR files aren&#8217;t as easy to find as MP3s, but there are several repositories worth knowing about:</p><ul><li><p><a href="http://midkar.com">midkar.com</a> has over 43,000 MIDI and KAR files, organized by genre</p></li><li><p><a href="http://freemidis.net">freemidis.net</a> offers around 6,400 free MIDI and karaoke files</p></li><li><p><a href="http://karaokeden.com">karaokeden.com</a> has free MIDI karaoke in multiple languages</p></li></ul><p>The quality varies. Some files have well-sequenced piano parts that translate beautifully to a player piano. Others have the piano buried in a mix of instruments or missing entirely. There was something deeply unsettling about the guitar being remapped onto the player piano in Elvis&#8217;s <em>Can&#8217;t Help Falling In Love With You</em>.</p><h1>Try it Yourself!</h1><p>The first song I tested was, of course, Piano Man. It worked phenomenally well. There&#8217;s something special about Billy Joel&#8217;s piano part played on real hammers and strings while you belt out the lyrics. A real piano captures the dynamics in a way that a software synth never could.</p><h3>The Beatles - Yesterday</h3><div id="youtube2-Jnw3HJlaiH0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Jnw3HJlaiH0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Jnw3HJlaiH0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h3>Billy Joel - Piano Man</h3><div id="youtube2-U5d-tsAz7s0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;U5d-tsAz7s0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/U5d-tsAz7s0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h3>Chicago - 25 or 6 to 4</h3><div id="youtube2-8g0aJ0JDWC4" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;8g0aJ0JDWC4&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/8g0aJ0JDWC4?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Both projects are open source:</p><p>MIDI Piano Pi Server (The Raspberry Pi Service):</p><p><a href="https://github.com/DavidWatkins/midi-piano-pi-server/">https://github.com/DavidWatkins/midi-piano-pi-server/</a></p><p>MIDI Karaoke (The Electron App):<br><a href="https://github.com/DavidWatkins/midi-karaoke/tree/main">https://github.com/DavidWatkins/midi-karaoke/</a></p><p>Precompiled applications for macOS, Windows, and Linux are available in the releases. If you have a Disklavier or any MIDI-enabled player piano, I&#8217;d love to hear if it works for you.</p><h3>The Beatles - Let It Be</h3><div id="youtube2-G8OsiOon3z0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;G8OsiOon3z0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/G8OsiOon3z0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>After 31 years, my dad finally has his synchronized karaoke system. Watching him sing along while the piano plays itself was worth every hour of debugging MIDI channels and soundfont hunting.</p><h1>Lessons for Robotics</h1><p>I made the player piano do something no one had made it do before, a new technological achievement, humble as it is. And Claude Code alone could not complete this project. Interfacing with different compute systems, plugging cables together, and debugging subtle timing issues - all of this requires high-dimensional perceptual input and high-dimensional long-horizon goal-directed output. The amazing thing about humans is our ability to dream up a goal, and then dream up a technical solution allowing us to achieve the goal, pushing subgoals and subgoals on the stack, and popping them back off to satisfy my Dad&#8217;s whimsical request.</p><h2><strong>References</strong></h2><p>[1] Gardner, J., Simon, I., Manilow, E., Hawthorne, C., &amp; Engel, J. (2022). MT3: Multi-Task Multitrack Music Transcription. In <em>International Conference on Learning Representations (ICLR)</em>.<a href="https://arxiv.org/abs/2111.03017"> https://arxiv.org/abs/2111.03017</a></p><p>[2] Choi, J. &amp; Lee, K. (2023). Pop2Piano: Pop Audio-based Piano Cover Generation. In <em>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</em>, pp. 1&#8211;5.<a href="https://github.com/sweetcocoa/pop2piano"> https://github.com/sweetcocoa/pop2piano</a></p><p><em>David Watkins is a Research Lead at the RAI Institute, where he leads teams working on robotic manipulation and foundation models. When not collecting robot demonstration data, he builds karaoke systems to fulfill 31-year-old family dreams.</em></p><p><em>Disclaimer: This is an independent project and is not affiliated with, endorsed by, or sponsored by Yamaha Corporation or any other company mentioned. All product names and trademarks are the property of their respective owners.</em></p><p></p>]]></content:encoded></item><item><title><![CDATA[Where the Curves Cross]]></title><description><![CDATA[One of my all-time favorite papers is from Andrew Ng and Michael Jordan: &#8220;On Discriminative vs.]]></description><link>https://whattotelltherobot.com/p/where-the-curves-cross</link><guid isPermaLink="false">https://whattotelltherobot.com/p/where-the-curves-cross</guid><dc:creator><![CDATA[Stefanie Tellex]]></dc:creator><pubDate>Fri, 09 Jan 2026 00:11:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!aoqA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>One of my all-time favorite papers is from Andrew Ng and Michael Jordan: <em>&#8220;<a href="https://ai.stanford.edu/~ang/papers/nips01-discriminativegenerative.pdf">On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes</a>.&#8221; Advances in Neural Information Processing Systems 14 (2001).</em> It was ten years old when I first read it when I was a postdoc in 2011, and it&#8217;s 25 years old now. It&#8217;s a beautiful paper, and I keep returning to it because it captures a pattern I see everywhere in machine learning and robotics.</p><p>The paper explores Naive Bayes (remember Naive Bayes? The advent of Spam filtering!) and Logistic Regression. These two models are used for discrete classification and make the same conditional independence assumption.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;P(class | features) \\propto P(features | class)  \\times P(class)&quot;,&quot;id&quot;:&quot;RQIYVWMCZF&quot;}" data-component-name="LatexBlockToDOM"></div><p>Naive Bayes estimates these probabilities by counting and multiplying. Logistic regression instead directly estimates the conditional distribution.  How can we model this conditional distribution? As Tom Mitchell explains in Machine Learning (1997) in <a href="https://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf">section 3.1</a>, the parametric form can be derived directly from the conditional independence assumptions we make for Naive Bayes, as the precise exponential form of the logistic function. We can even use the Naive Bayes assumptions to estimate the logistic feature weights to produce the same result as the Naive Bayes estimator. But we can also estimate the weights directly, for example, by choosing weights that maximize the conditional data likelihood via gradient descent.</p><p>Here is the key observation: in this sense, logistic regression searches over a larger space of models than Naive Bayes. Naive Bayes builds in more structure. Logistic regression is more flexible.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h1>The Crossing</h1><p>Ng and Jordan provide theoretical and empirical results showing that this leads to two distinct performance regimes. When the dataset is small, the more structured model (Naive Bayes) outperforms Logistic Regression because it leverages its assumptions to learn from fewer data points. When the dataset is large, Naive Bayes asymptotes to a higher error rate than Logistic Regression. The curves cross.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aoqA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aoqA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png 424w, https://substackcdn.com/image/fetch/$s_!aoqA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png 848w, https://substackcdn.com/image/fetch/$s_!aoqA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png 1272w, https://substackcdn.com/image/fetch/$s_!aoqA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aoqA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png" width="592" height="485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:485,&quot;width&quot;:592,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28338,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/183860584?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aoqA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png 424w, https://substackcdn.com/image/fetch/$s_!aoqA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png 848w, https://substackcdn.com/image/fetch/$s_!aoqA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png 1272w, https://substackcdn.com/image/fetch/$s_!aoqA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63a6cc3a-8c65-4aaa-804f-855e852d6dc7_592x485.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>In this figure which is quoted from Ng and Jordan&#8217;s paper, the X-axis shows dataset size, and the Y-axis shows classification error (lower is better). The solid line is Naive Bayes, the dotted line is Logistic Regression.</em></p><p>This graph shows that as we add more data to the training set, test set accuracy improves for both methods. In the low-data regime, the more structured model outperforms. As we add more data, the model with less structure fits the distribution better and asymptotes lower. There are two performance regions, one in the low-data regime, and one in the high-data regime.</p><p>This is the insight that changed how I think about model selection: it isn&#8217;t that Naive Bayes is better and Logistic Regression is worse. Both methods are more effective in certain zones. The question is which zone you&#8217;re operating in.</p><h1>Why This Matters for Robotics</h1><p>I&#8217;ve found this pattern to be constantly relevant in robotics. Models with more structure can outperform with less data (<a href="https://www2.ccs.neu.edu/research/helpinghands/author/robert-platt/">consider the work by Rob Platt and his collaborators on equivariance</a>), especially when the structure is somewhat correct. Models with less structure need much more data but can asymptote to higher values (i.e., the bitter lesson).</p><p>In my lab reading group, we recently read <em>&#8220;<a href="https://openreview.net/forum?id=3CQ3Vt0v99">Inquire: Interactive Querying for User-Aware Informative Reasoning</a>&#8221; </em>by Tesca Fitzgerald et al. The paper addresses a practical question: what kind of input should an algorithm request from a person to train a skill? The options include demonstrations (showing the robot what to do), preferences (choosing between two trajectories), corrections (modifying a trajectory), or binary questions (is this trajectory okay?).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k7Sw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb0553de-df0f-45a8-be51-93c523c9ca47_653x463.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k7Sw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb0553de-df0f-45a8-be51-93c523c9ca47_653x463.png 424w, https://substackcdn.com/image/fetch/$s_!k7Sw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb0553de-df0f-45a8-be51-93c523c9ca47_653x463.png 848w, https://substackcdn.com/image/fetch/$s_!k7Sw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb0553de-df0f-45a8-be51-93c523c9ca47_653x463.png 1272w, https://substackcdn.com/image/fetch/$s_!k7Sw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb0553de-df0f-45a8-be51-93c523c9ca47_653x463.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k7Sw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb0553de-df0f-45a8-be51-93c523c9ca47_653x463.png" width="653" height="463" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb0553de-df0f-45a8-be51-93c523c9ca47_653x463.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:463,&quot;width&quot;:653,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:91621,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/183860584?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb0553de-df0f-45a8-be51-93c523c9ca47_653x463.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k7Sw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb0553de-df0f-45a8-be51-93c523c9ca47_653x463.png 424w, https://substackcdn.com/image/fetch/$s_!k7Sw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb0553de-df0f-45a8-be51-93c523c9ca47_653x463.png 848w, https://substackcdn.com/image/fetch/$s_!k7Sw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb0553de-df0f-45a8-be51-93c523c9ca47_653x463.png 1272w, https://substackcdn.com/image/fetch/$s_!k7Sw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb0553de-df0f-45a8-be51-93c523c9ca47_653x463.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>In this graph quoted from Tesca&#8217;s paper, the X-axis is the number of queries, Y-axis is the distance from the optimal policy (lower is better)&#8212;multiple lines for different query types.</em></p><p>We see the same crossing phenomenon. The Demos-only line corresponds to using only behavior cloning. It asymptotes quickly but does not achieve optimal performance. The Preferences-Only line corresponds to asking a person to choose which of two trajectories they prefer, more like on-policy RL with a dense reward function. This method asymptotes more slowly, so with fewer queries, the behavior cloning approach is better. But for more than 10 queries, the preference-based approach outperforms behavior cloning because it is free to search outside the demonstrations for an optimal policy.</p><p>The method described in the paper, INQUIRE, gets the best of both worlds: it relies on demonstrations to perform well with less data, then uses preferences to achieve lower asymptotic error. With only 20 demonstrations in total, we are far from the &#8220;large data regime&#8221; on this task, making the hybrid approach particularly valuable.</p><h1>A Framework for Thinking About Model Selection</h1><p>Observing where curves cross changes helps us change from black-and-white thinking to recognizing there is a spectrum of approaches with trade-offs at each end. This perspective opens research questions beyond &#8220;more data.&#8221; The questions become: Does your task have a well-defined structure that you can encode in a model? Do you have lots of data and compute? And perhaps most interesting: how can we find the right structure for robotics problems that enables data- and compute-efficient learning without sacrificing asymptotic performance?</p><p></p><p>Thanks to Jessica Hodgkins for comments on earlier drafts of this post.  Any errors that remain are our own. </p>]]></content:encoded></item><item><title><![CDATA[2025 in Review: The Year of Showing Up]]></title><description><![CDATA[This year, I joined a band, shot principal photography for a murder mystery, and became militant about ending talks on time.]]></description><link>https://whattotelltherobot.com/p/2025-in-review-the-year-of-showing</link><guid isPermaLink="false">https://whattotelltherobot.com/p/2025-in-review-the-year-of-showing</guid><dc:creator><![CDATA[David Watkins]]></dc:creator><pubDate>Wed, 31 Dec 2025 18:09:20 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/b347a933-8c61-4b6c-bafb-43bfcf7548f2_817x427.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This year, I joined a band, shot principal photography for a murder mystery, and became militant about ending talks on time. It was a good year.</p><p>Looking back at 2025, I&#8217;m struck by how much of it involved being in rooms I hadn&#8217;t been in before, conference stages, wastewater treatment plants, tiny six-seater planes, and rehearsal spaces where I was definitely new to singing in a band. If there&#8217;s a thread connecting everything, it&#8217;s that I kept saying yes to things that excited me, and most of them turned out better than expected.</p><p>Here&#8217;s how it went.</p><h1>Conferences &amp; Community</h1><h2>NEMS 2025</h2><p>In April, I co-organized the New England Manipulation Symposium with Lael Odhner and Kaitlyn Becker. My job was coordinating paper acceptances, scheduling speakers, helping Kait wrangle the space at MIT, and giving a talk about the future of intelligent robotics</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7teJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f12303-3043-43a9-9000-34ab75799464_1954x613.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7teJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f12303-3043-43a9-9000-34ab75799464_1954x613.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7teJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f12303-3043-43a9-9000-34ab75799464_1954x613.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7teJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f12303-3043-43a9-9000-34ab75799464_1954x613.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7teJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f12303-3043-43a9-9000-34ab75799464_1954x613.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7teJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f12303-3043-43a9-9000-34ab75799464_1954x613.jpeg" width="1456" height="457" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c4f12303-3043-43a9-9000-34ab75799464_1954x613.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:457,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:214378,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/183076445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f12303-3043-43a9-9000-34ab75799464_1954x613.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7teJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f12303-3043-43a9-9000-34ab75799464_1954x613.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7teJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f12303-3043-43a9-9000-34ab75799464_1954x613.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7teJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f12303-3043-43a9-9000-34ab75799464_1954x613.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7teJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f12303-3043-43a9-9000-34ab75799464_1954x613.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The thing I&#8217;m most proud of? We ended ten minutes early. I was militant about cutting off talks that ran over, and people actually thanked me for it. The thing I would change for next time is giving people more time to talk in the hallway outside talks. The whole day came together beautifully: a packed room, great energy, and a group photo where everyone looks genuinely happy to be there.</p><h2>GTC in March</h2><p>I was invited to NVIDIA&#8217;s GPU Technology Conference to meet with others in the field. The keynotes were worth attending in person, and it was good to reconnect with colleagues working on similar problems.</p><h2>CoRL in Seoul</h2><p>I didn&#8217;t present at CoRL this year, but attended to see what&#8217;s happening in the field. The vibe was very much &#8220;everyone is collecting data for robotics.&#8221; There&#8217;s been a notable shift back to hardware, not in the sense of building better robots, but in creating better teleoperation and data collection systems. Computer scientists who used to focus purely on software are now designing hardware for data acquisition.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ORr6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395a920-b075-4ddf-b23a-57cafd956d21_1916x1077.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ORr6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395a920-b075-4ddf-b23a-57cafd956d21_1916x1077.png 424w, https://substackcdn.com/image/fetch/$s_!ORr6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395a920-b075-4ddf-b23a-57cafd956d21_1916x1077.png 848w, https://substackcdn.com/image/fetch/$s_!ORr6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395a920-b075-4ddf-b23a-57cafd956d21_1916x1077.png 1272w, https://substackcdn.com/image/fetch/$s_!ORr6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395a920-b075-4ddf-b23a-57cafd956d21_1916x1077.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ORr6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395a920-b075-4ddf-b23a-57cafd956d21_1916x1077.png" width="1456" height="818" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6395a920-b075-4ddf-b23a-57cafd956d21_1916x1077.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:818,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ORr6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395a920-b075-4ddf-b23a-57cafd956d21_1916x1077.png 424w, https://substackcdn.com/image/fetch/$s_!ORr6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395a920-b075-4ddf-b23a-57cafd956d21_1916x1077.png 848w, https://substackcdn.com/image/fetch/$s_!ORr6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395a920-b075-4ddf-b23a-57cafd956d21_1916x1077.png 1272w, https://substackcdn.com/image/fetch/$s_!ORr6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6395a920-b075-4ddf-b23a-57cafd956d21_1916x1077.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I came away with mixed feelings. On one hand, the bet on data-driven methods is clearly accelerating. On the other hand, I saw a lot of companies selling multi-degree-of-freedom hands and humanoid robots without clear plans for how to train them or what problems they&#8217;d actually solve: solutions looking for problems. There&#8217;s also a pervasive issue of overinflated claims, with researchers presenting general-purpose capabilities that aren&#8217;t yet supported by what their robots can actually do.</p><p>One talk that stuck with me was Sangbae&#8217;s, which critiqued the field&#8217;s lack of understanding of fundamental problems and encouraged deeper reflection on what we&#8217;re actually trying to achieve. It echoed something I&#8217;ve been thinking about: the field would benefit from more focus on high-quality data and clearly defined success metrics for specific problems, rather than broad, unsolvable goals like &#8220;solving manipulation.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LbDn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074e7d53-9f69-4f65-8dff-07cf3eaab6cb_1600x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LbDn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074e7d53-9f69-4f65-8dff-07cf3eaab6cb_1600x1200.png 424w, https://substackcdn.com/image/fetch/$s_!LbDn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074e7d53-9f69-4f65-8dff-07cf3eaab6cb_1600x1200.png 848w, https://substackcdn.com/image/fetch/$s_!LbDn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074e7d53-9f69-4f65-8dff-07cf3eaab6cb_1600x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!LbDn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074e7d53-9f69-4f65-8dff-07cf3eaab6cb_1600x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LbDn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074e7d53-9f69-4f65-8dff-07cf3eaab6cb_1600x1200.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/074e7d53-9f69-4f65-8dff-07cf3eaab6cb_1600x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LbDn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074e7d53-9f69-4f65-8dff-07cf3eaab6cb_1600x1200.png 424w, https://substackcdn.com/image/fetch/$s_!LbDn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074e7d53-9f69-4f65-8dff-07cf3eaab6cb_1600x1200.png 848w, https://substackcdn.com/image/fetch/$s_!LbDn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074e7d53-9f69-4f65-8dff-07cf3eaab6cb_1600x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!LbDn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074e7d53-9f69-4f65-8dff-07cf3eaab6cb_1600x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While I was in South Korea, a colleague suggested we visit the DMZ. Standing at the border, seeing the two countries side by side, was sobering in a way that&#8217;s hard to articulate.</p><h2>Columbia Robotics Hackathon</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pTKA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f7fc2b-6a92-4022-9bb5-3e83d0d54f8a_2056x615.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pTKA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f7fc2b-6a92-4022-9bb5-3e83d0d54f8a_2056x615.png 424w, https://substackcdn.com/image/fetch/$s_!pTKA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f7fc2b-6a92-4022-9bb5-3e83d0d54f8a_2056x615.png 848w, https://substackcdn.com/image/fetch/$s_!pTKA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f7fc2b-6a92-4022-9bb5-3e83d0d54f8a_2056x615.png 1272w, https://substackcdn.com/image/fetch/$s_!pTKA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f7fc2b-6a92-4022-9bb5-3e83d0d54f8a_2056x615.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pTKA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f7fc2b-6a92-4022-9bb5-3e83d0d54f8a_2056x615.png" width="1456" height="436" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04f7fc2b-6a92-4022-9bb5-3e83d0d54f8a_2056x615.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:436,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2658943,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/183076445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f7fc2b-6a92-4022-9bb5-3e83d0d54f8a_2056x615.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pTKA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f7fc2b-6a92-4022-9bb5-3e83d0d54f8a_2056x615.png 424w, https://substackcdn.com/image/fetch/$s_!pTKA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f7fc2b-6a92-4022-9bb5-3e83d0d54f8a_2056x615.png 848w, https://substackcdn.com/image/fetch/$s_!pTKA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f7fc2b-6a92-4022-9bb5-3e83d0d54f8a_2056x615.png 1272w, https://substackcdn.com/image/fetch/$s_!pTKA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f7fc2b-6a92-4022-9bb5-3e83d0d54f8a_2056x615.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In November, I returned to Columbia as a judge for the <a href="https://www.makecu.dev/">MakeCU hackathon</a>. What struck me was the sheer growth from last year. Students came from out of town to participate, and the projects were ambitious. One team finally implemented something I&#8217;d dreamed about in college: a smart lock compatible with dorm rooms. Seeing students solve problems I&#8217;d only imagined was a highlight.</p><p></p><h2>Lions in AI Panel</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uxua!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d255c3e-5334-459e-a68b-17b7f4ad7f66_1600x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uxua!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d255c3e-5334-459e-a68b-17b7f4ad7f66_1600x1200.png 424w, https://substackcdn.com/image/fetch/$s_!uxua!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d255c3e-5334-459e-a68b-17b7f4ad7f66_1600x1200.png 848w, https://substackcdn.com/image/fetch/$s_!uxua!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d255c3e-5334-459e-a68b-17b7f4ad7f66_1600x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!uxua!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d255c3e-5334-459e-a68b-17b7f4ad7f66_1600x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uxua!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d255c3e-5334-459e-a68b-17b7f4ad7f66_1600x1200.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0d255c3e-5334-459e-a68b-17b7f4ad7f66_1600x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uxua!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d255c3e-5334-459e-a68b-17b7f4ad7f66_1600x1200.png 424w, https://substackcdn.com/image/fetch/$s_!uxua!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d255c3e-5334-459e-a68b-17b7f4ad7f66_1600x1200.png 848w, https://substackcdn.com/image/fetch/$s_!uxua!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d255c3e-5334-459e-a68b-17b7f4ad7f66_1600x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!uxua!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d255c3e-5334-459e-a68b-17b7f4ad7f66_1600x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Earlier this year, I joined a panel at the Columbia Alumni Association of Boston alongside fellow Columbia alumni working in AI. <a href="https://sites.bu.edu/barnet-sherman/">Barnet Sherman</a> moderated a discussion about AI and automation across different industries. I represented the robotics perspective. The audience questions were sharp, and I left feeling good about helping demystify what&#8217;s actually happening in the field right now.</p><h2>Dr. Waku Interview</h2><div id="youtube2-89P2Kckr2RQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;89P2Kckr2RQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/89P2Kckr2RQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>I was <a href="https://www.youtube.com/watch?v=89P2Kckr2RQ">featured on Dr. Waku&#8217;s YouTube channel</a> to discuss the &#8220;ChatGPT moment&#8221; in robotics, or rather, why we haven&#8217;t had one yet. We talked about where robotics and AI still need development before we see the kind of breakthrough that makes everything feel different.</p><h1>Writing</h1><h2>Launching the Blog</h2><p>Stefanie Tellex and I had been collaborating for a while. She&#8217;s a professor at Brown University who studies how robots understand language, which made her the perfect co-conspirator for a blog about what to tell them. We published &#8220;A Survey of Robotic Language Grounding: Tradeoffs Between Symbols and Embeddings&#8221; at IJCAI in 2024, and then George Konidaris asked us to write a chapter for his upcoming book, <em>Designing an Intelligence</em>, which we published earlier this year, <em>Elephants Don&#8217;t Write Sonnets: The Physically Grounded Turing Test</em>.</p><p>We discovered that we genuinely enjoyed writing together. So we kept it going.</p><p>In August, we officially launched <a href="https://whattotelltherobot.com/">What to Tell the Robot</a> (What for Watkins and Tell for Tellex) with the publication of our book chapter. The blog has become a space for us to work through ideas about robotics, AI, and the things we think matter.</p><h2>Elephants Don&#8217;t Write Sonnets</h2><p>Our <a href="https://whattotelltherobot.com/p/elephants-dont-write-sonnets">flagship post</a> lays out the thesis of our book chapter: that the original Turing Test is no longer sufficient, and we need a new benchmark. We call this the Physically Grounded Turing Test. The argument is that elephants don&#8217;t play chess or write sonnets, but we all agree they&#8217;re intelligent. True intelligence requires embodiment, perception, and action in the physical world, not just manipulating language.</p><h2>The Deer Island Marvel</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p_Go!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3746f9e-143b-495a-ac17-93a8346f37aa_1456x1092.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p_Go!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3746f9e-143b-495a-ac17-93a8346f37aa_1456x1092.png 424w, https://substackcdn.com/image/fetch/$s_!p_Go!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3746f9e-143b-495a-ac17-93a8346f37aa_1456x1092.png 848w, https://substackcdn.com/image/fetch/$s_!p_Go!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3746f9e-143b-495a-ac17-93a8346f37aa_1456x1092.png 1272w, https://substackcdn.com/image/fetch/$s_!p_Go!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3746f9e-143b-495a-ac17-93a8346f37aa_1456x1092.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p_Go!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3746f9e-143b-495a-ac17-93a8346f37aa_1456x1092.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e3746f9e-143b-495a-ac17-93a8346f37aa_1456x1092.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p_Go!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3746f9e-143b-495a-ac17-93a8346f37aa_1456x1092.png 424w, https://substackcdn.com/image/fetch/$s_!p_Go!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3746f9e-143b-495a-ac17-93a8346f37aa_1456x1092.png 848w, https://substackcdn.com/image/fetch/$s_!p_Go!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3746f9e-143b-495a-ac17-93a8346f37aa_1456x1092.png 1272w, https://substackcdn.com/image/fetch/$s_!p_Go!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3746f9e-143b-495a-ac17-93a8346f37aa_1456x1092.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One of my favorite pieces this year was about <a href="https://whattotelltherobot.com/p/the-deer-island-marvel">visiting the Deer Island Wastewater Treatment Plant</a>. Stefie and I toured this $3.8 billion facility that processes 360 million gallons of wastewater daily, and I couldn&#8217;t stop thinking about the robotics opportunities hiding in plain sight. The engineering is staggering, with 12 egg-shaped digesters, each 90 feet in diameter, but much of the inspection and maintenance is still manual. Workers enter digesters on rafts to clean them. Plastic removal happens by hand. There&#8217;s real work to be done here.</p><h1>Adventures</h1><h2>Culebra</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iiV2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa095778a-4ce4-4c69-9191-a802591c0bb0_2229x616.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iiV2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa095778a-4ce4-4c69-9191-a802591c0bb0_2229x616.png 424w, https://substackcdn.com/image/fetch/$s_!iiV2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa095778a-4ce4-4c69-9191-a802591c0bb0_2229x616.png 848w, https://substackcdn.com/image/fetch/$s_!iiV2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa095778a-4ce4-4c69-9191-a802591c0bb0_2229x616.png 1272w, https://substackcdn.com/image/fetch/$s_!iiV2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa095778a-4ce4-4c69-9191-a802591c0bb0_2229x616.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iiV2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa095778a-4ce4-4c69-9191-a802591c0bb0_2229x616.png" width="1456" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a095778a-4ce4-4c69-9191-a802591c0bb0_2229x616.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2823975,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/183076445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa095778a-4ce4-4c69-9191-a802591c0bb0_2229x616.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iiV2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa095778a-4ce4-4c69-9191-a802591c0bb0_2229x616.png 424w, https://substackcdn.com/image/fetch/$s_!iiV2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa095778a-4ce4-4c69-9191-a802591c0bb0_2229x616.png 848w, https://substackcdn.com/image/fetch/$s_!iiV2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa095778a-4ce4-4c69-9191-a802591c0bb0_2229x616.png 1272w, https://substackcdn.com/image/fetch/$s_!iiV2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa095778a-4ce4-4c69-9191-a802591c0bb0_2229x616.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Despite spending more than a collective year of my life in Puerto Rico, I&#8217;d never made it to Culebra until this August. We took a six-seater plane. I am pretty sure neither the runway nor the plane was excited about the total weight of my 6-member immediate family and two dogs. The island is tiny, even smaller than Key West, which I&#8217;d visited for the first time 5 months earlier in March of this year. Beautiful views, though, and worth the questionable takeoff and landing to celebrate my mom&#8217;s birthday.</p><h2>Quebec City by EV</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bFvI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd82060-376d-490a-b754-39ebbb1df1b9_1886x613.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bFvI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd82060-376d-490a-b754-39ebbb1df1b9_1886x613.png 424w, https://substackcdn.com/image/fetch/$s_!bFvI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd82060-376d-490a-b754-39ebbb1df1b9_1886x613.png 848w, https://substackcdn.com/image/fetch/$s_!bFvI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd82060-376d-490a-b754-39ebbb1df1b9_1886x613.png 1272w, https://substackcdn.com/image/fetch/$s_!bFvI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd82060-376d-490a-b754-39ebbb1df1b9_1886x613.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bFvI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd82060-376d-490a-b754-39ebbb1df1b9_1886x613.png" width="1456" height="473" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ebd82060-376d-490a-b754-39ebbb1df1b9_1886x613.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:473,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1999798,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/183076445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd82060-376d-490a-b754-39ebbb1df1b9_1886x613.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bFvI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd82060-376d-490a-b754-39ebbb1df1b9_1886x613.png 424w, https://substackcdn.com/image/fetch/$s_!bFvI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd82060-376d-490a-b754-39ebbb1df1b9_1886x613.png 848w, https://substackcdn.com/image/fetch/$s_!bFvI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd82060-376d-490a-b754-39ebbb1df1b9_1886x613.png 1272w, https://substackcdn.com/image/fetch/$s_!bFvI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd82060-376d-490a-b754-39ebbb1df1b9_1886x613.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>I drove to and from Quebec City with an electric vehicle this year. Quebec was beautiful in August, and I got to see some stunning natural sights. I learned about the wonders of using ChatGPT as a tour guide with my girlfriend, Amelia. I was truly in awe of how cool it was to take pictures of anything, learn detailed information about it, and simultaneously ask it where the best place to get food was, right where we were. Traveling by EV was by far the worst choice, as it took us 12 hours to get there instead of 6, despite Canada being extremely EV-friendly.</p><h2>Facts &amp; Figures</h2><p>My coworker Kevin Karol wrote a murder mystery dance party called <em>Facts &amp; Figures</em>, which premiered at the Boston Fringe Festival. I did the principal photography.</p><p>I learned more than I expected: how to work with stage lighting, how to get better angles on principal actors, the timing of theatrical photography, and how to edit images shot in low light. I got an entirely new perspective on the acting I did at the Edinburgh Fringe Festival 15 years ago. It&#8217;s something I&#8217;d love to do again.</p><h2>RAI Band</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UNpz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4f6a18-64f0-4090-b24e-6395eaaf8f7a_1916x1442.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UNpz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4f6a18-64f0-4090-b24e-6395eaaf8f7a_1916x1442.png 424w, https://substackcdn.com/image/fetch/$s_!UNpz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4f6a18-64f0-4090-b24e-6395eaaf8f7a_1916x1442.png 848w, https://substackcdn.com/image/fetch/$s_!UNpz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4f6a18-64f0-4090-b24e-6395eaaf8f7a_1916x1442.png 1272w, https://substackcdn.com/image/fetch/$s_!UNpz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4f6a18-64f0-4090-b24e-6395eaaf8f7a_1916x1442.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UNpz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4f6a18-64f0-4090-b24e-6395eaaf8f7a_1916x1442.png" width="1456" height="1096" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e4f6a18-64f0-4090-b24e-6395eaaf8f7a_1916x1442.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1096,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UNpz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4f6a18-64f0-4090-b24e-6395eaaf8f7a_1916x1442.png 424w, https://substackcdn.com/image/fetch/$s_!UNpz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4f6a18-64f0-4090-b24e-6395eaaf8f7a_1916x1442.png 848w, https://substackcdn.com/image/fetch/$s_!UNpz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4f6a18-64f0-4090-b24e-6395eaaf8f7a_1916x1442.png 1272w, https://substackcdn.com/image/fetch/$s_!UNpz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4f6a18-64f0-4090-b24e-6395eaaf8f7a_1916x1442.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My company started a group band this year, and I volunteered for vocals. I&#8217;m in the rock portion of the band, and we&#8217;ve been practicing &#8220;Tribute&#8221; by Tenacious D. I love singing as part of a group and rehearsing the songs, though I have a lot of room to grow. There&#8217;s something satisfying about learning a skill that has nothing to do with your day job.</p><h2>Piano</h2><p>I&#8217;ve kept up with my piano lessons with Tatiana Bercu. It&#8217;s been gratifying to finally be able to play <em>Gottes Zeit ist die allerbeste Zeit </em>by Bach. Sight reading becomes easier every week!</p><h1>What I Learned</h1><p>If I had to distill this year into a lesson, it would be about the importance of multidisciplinary teams and the value of implementing software yourself.</p><p>In the 2000s and 2010s, software had something technology never had before: virtual free replication to customers. No physical media needed, so many talented people got excited about building software companies. Those same people are now looking at robotics.</p><p>On the flip side, many robotics companies have focused primarily on hardware, building out the mechanical systems and expecting customers to figure out how to program them. Neither approach works as well as you&#8217;d hope.</p><p>What I&#8217;ve seen work, both this year and over my career, is bringing multidisciplinary teams together from day one. Having hardware people talking to software people from the start produces better robots. Even better, as a manager, having access to these AI tools to assist you gives you so much more. Implementing software yourself, even assisted, is how you keep learning. There is something irreplaceable about staying hands-on. You understand problems differently when you&#8217;ve built even part of the thing yourself.</p><h1>Looking Ahead to 2026</h1><p>I&#8217;m looking forward to building greater things.</p><p>That&#8217;s vague, I know. But after a year of showing in new places and saying yes to unfamiliar challenges, I have a clearer sense of what I want to build and who I want to build it with. The blog will keep growing. The research will continue. And I&#8217;ll probably say yes to a few more things that scare me.</p><p>Thanks for reading. If you want to follow along, subscribe to <a href="https://whattotelltherobot.com/">What to Tell the Robot</a>, and I&#8217;ll see you in the new year.</p><p><em>I want to acknowledge both Stefanie Tellex and Reena Leone for reviewing the post before publication and making constructive suggestions. </em></p>]]></content:encoded></item><item><title><![CDATA[The Deer Island Marvel]]></title><description><![CDATA[Wastewater treatment and engineering excellence]]></description><link>https://whattotelltherobot.com/p/the-deer-island-marvel</link><guid isPermaLink="false">https://whattotelltherobot.com/p/the-deer-island-marvel</guid><dc:creator><![CDATA[David Watkins]]></dc:creator><pubDate>Fri, 05 Dec 2025 12:10:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ALMP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff58addbc-655a-454b-9c80-36be66a9f36c_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ALMP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff58addbc-655a-454b-9c80-36be66a9f36c_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ALMP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff58addbc-655a-454b-9c80-36be66a9f36c_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!ALMP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff58addbc-655a-454b-9c80-36be66a9f36c_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!ALMP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff58addbc-655a-454b-9c80-36be66a9f36c_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!ALMP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff58addbc-655a-454b-9c80-36be66a9f36c_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ALMP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff58addbc-655a-454b-9c80-36be66a9f36c_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f58addbc-655a-454b-9c80-36be66a9f36c_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ALMP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff58addbc-655a-454b-9c80-36be66a9f36c_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!ALMP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff58addbc-655a-454b-9c80-36be66a9f36c_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!ALMP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff58addbc-655a-454b-9c80-36be66a9f36c_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!ALMP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff58addbc-655a-454b-9c80-36be66a9f36c_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Two of the twelve anaerobic digesters that our tour group stood in the middle of</em></figcaption></figure></div><p>The Deer Island Wastewater Treatment Plant is something that many East Coast Massachusetts residents take for granted. But as Stefie and I toured the massive, $3.8 billion facility, it became clear that this is an engineering marvel. Some editorializing: I had never seen Stefie so excited about a field trip before, even when the smell got a little too unbearable for me. And honestly, it&#8217;s hard not to get excited when you&#8217;re staring at problems that perfectly illustrate where effective use of engineering principles has solved significant societal challenges.</p><h2><strong>The Scale of What We&#8217;re Missing</strong></h2><p>Deer Island processes an average of 360 million gallons of wastewater daily for the greater Boston area, with peaks reaching 1.3 billion gallons per day. The facility spans 365 acres and serves 43 communities, representing about 34 percent of Massachusetts&#8217; total population for sewage treatment services. Walking through the primary treatment tanks, secondary clarifiers, and the impressive egg-shaped digesters, you&#8217;re confronted with the sheer scale of infrastructure that keeps modern society functioning and struck by how much of it still relies on manual inspection and maintenance.</p><p>The plant&#8217;s architecture is breathtaking. The twelve egg-shaped anaerobic digesters, each 90 feet in diameter and 110 feet tall, dominate the skyline and hold 3 million gallons apiece. These structures alone cost hundreds of millions to construct and require constant monitoring to maintain optimal conditions for breaking down organic matter and producing methane that powers 20% of the plant. Yet much of the inspection work is still done by humans, who climb into confined spaces, work in hazardous environments, and manually check equipment. Our tour guides explained the costly process of emptying a digester of sludge and refilling it with water. People enter on rafts to inspect and clean the inside. (We were asked repeatedly to avoid using flushable wipes. Flushable wipes are not flushable!)</p><p>The engineering complexity is staggering. The plant uses 48 primary clarifiers, each 186 feet long by 41 feet wide by 24 feet deep, with &#8220;stacked&#8221; settling surfaces at mid-depth to double the settling capacity within the tight space confines of Deer Island. Over one hundred tons of pure oxygen are manufactured each day at the facility&#8217;s cryogenic plant to support the biological treatment process, raising pollution removal to over 85%.</p><p>Beyond the impressive scale of waste processing, Deer Island became an unexpected frontline in pandemic surveillance during COVID-19. MWRA partnered with Cambridge-based Biobot Analytics to track wastewater at Deer Island for COVID-19 infection indicators, with samples analyzed daily. The facility processes wastewater from 43 communities across eastern Massachusetts, which provides a comprehensive view of viral spread in the Greater Boston area.</p><p>What made this approach particularly valuable was that wastewater surveillance could detect virus levels several days before positive test numbers started to increase, serving as an early warning system for community outbreaks. Unlike traditional case counts, COVID-19 data from sewage measured virus prevalence in the community at large, including among people who didn&#8217;t have symptoms and didn&#8217;t get tested, since the virus they shed through bodily waste contributes to levels found in sewage. The plant&#8217;s COVID-19 traces provide public health officials with critical data for policy decisions.</p><h2><strong>The Plastic Problem</strong></h2><p>One of the most striking observations during our tour was the amount of plastic waste that accumulates at various stages of the treatment process. Despite screens and filters, plastic debris, everything from bottle caps to grocery bags, constantly surfaces in the treatment tanks. Workers currently remove this material manually, a labor-intensive process.</p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/jpeg&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/581b91a8-2cd9-4f2f-8b64-30d3db6ca886_1916x1077.jpeg&quot;},{&quot;type&quot;:&quot;image/jpeg&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6bd2f22-017b-4b6a-a001-9e7aa041398f_1376x1835.jpeg&quot;}],&quot;caption&quot;:&quot;Pictures of the collection facility with bits of plastic that are manually collected&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/31a131e2-97d2-4b53-bbd6-8e42bcdba6de_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p>This isn&#8217;t just about automation for efficiency&#8217;s sake. This is about creating solutions where none currently exist at scale. The plastic removal problem at wastewater facilities represents a perfect example of an opportunity to solve problems with automation that humans have struggled with.</p><p>The trip opened our eyes to the many potential applications of robotics to automate processes that are heavily manual today. One of our colleagues, Howie Choset, co-founded a new venture,<a href="https://www.pipeforce.ai/"> Pipe Force AI</a>. Instead of building another general-purpose robot, Pipe Force AI is explicitly focused on robotic inspection of storm sewer pipes. Storm sewers are critical infrastructure that require regular inspection to prevent flooding and environmental damage, yet current inspection methods are dull, dirty, and dangerous.</p><p>While we talk a lot about the Bitter Lesson on our blog, the importance of a general method that applies to a lot of different problems remains applicable. Inspecting pipes in municipalities at 900 ft/hr, Pipe Force AI is developing technology that could aid in inspecting miles of pipe leading into the wastewater treatment plant.</p><h2><strong>The Infrastructure Opportunity</strong></h2><p>What excites me most about visiting places like Deer Island is realizing how much critical infrastructure operates with minimal automation. Water treatment, wastewater processing, and stormwater management offer us, as roboticists, opportunities to find new ways to motivate our research and explore new possibilities.</p><p>These aren&#8217;t glamorous applications: Deer Island is dirty and dangerous (but definitely not dull!). There are no viral videos of robots cleaning grease from clarifier tanks or inspecting the inside of digester vessels. But these applications represent precisely the kinds of problems where robotics can create genuine value: capabilities that humans haven&#8217;t yet unlocked.</p><h2><strong>A Monument to Engineering Excellence</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y93_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681f46a-b8e2-48f5-b806-28bcf3258d35_1600x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y93_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681f46a-b8e2-48f5-b806-28bcf3258d35_1600x1200.png 424w, https://substackcdn.com/image/fetch/$s_!y93_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681f46a-b8e2-48f5-b806-28bcf3258d35_1600x1200.png 848w, https://substackcdn.com/image/fetch/$s_!y93_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681f46a-b8e2-48f5-b806-28bcf3258d35_1600x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!y93_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681f46a-b8e2-48f5-b806-28bcf3258d35_1600x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y93_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681f46a-b8e2-48f5-b806-28bcf3258d35_1600x1200.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0681f46a-b8e2-48f5-b806-28bcf3258d35_1600x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y93_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681f46a-b8e2-48f5-b806-28bcf3258d35_1600x1200.png 424w, https://substackcdn.com/image/fetch/$s_!y93_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681f46a-b8e2-48f5-b806-28bcf3258d35_1600x1200.png 848w, https://substackcdn.com/image/fetch/$s_!y93_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681f46a-b8e2-48f5-b806-28bcf3258d35_1600x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!y93_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0681f46a-b8e2-48f5-b806-28bcf3258d35_1600x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Stefie (left), David (middle), and Stephen Hart (right) on top of one of the digesters with the Boston skyline</em></figcaption></figure></div><p>The smells hit you in waves as you walk through different sections of the facility. Air scrubbers and carbon adsorbers continuously remove odors and volatile organic compounds from treatment process &#8220;off-gases,&#8221; covering primary and secondary treatment facilities, sludge processing, and grit removal. Still, the primary clarifiers, where gravity separates sludge and scum from incoming wastewater, are where the tour gets most aromatic. Despite the tour guides&#8217; repeated reminders about the air permit, our noses kept reminding us that waste is managed at the facility.</p><p>The facility&#8217;s transformation of Boston Harbor represents one of America&#8217;s greatest environmental success stories. Before the new plant opened in 2000, the system had combined sewer overflows an average of 60 days per year, with about 10 billion gallons per year of untreated sewage flowing into Boston Harbor. By the 1960s, Boston Harbor was covered in a deep sludge resembling molasses.</p><p>The engineering challenges were immense. Wastewater from the 43 communities reaches the plant via four underground tunnels, then is pumped about 150 feet to the treatment facilities.  Gravel from these tunnels was used to line the bases of the sludge digesters.  The project was tragically marked by the deaths of two divers working in the narrowing anoxic outfall tunnel ten miles from land during construction of the 9.5-mile underwater discharge system.</p><p>Even today, the facility continues to face operational challenges. As recently as August 2019, Deer Island had to run on backup power for several days at a cost of about $30,000 per day during installation of a new $115 million power cable across Boston Harbor. The original cable had been installed too shallowly three decades earlier, violating federal permits and eventually blocking harbor dredging operations.</p><p>Standing among those iconic egg-shaped digesters, watching the complex choreography of pumps, clarifiers, and biological treatment systems processing hundreds of millions of gallons daily, you witness not just waste treatment but a monument to what ambitious engineering can accomplish. The methane captured from digestion powers boilers that heat the entire facility and drive steam turbine generators producing an average of 3 megawatts of electricity. Digested sludge leaves the island through the Inter-Island Tunnel to be processed into fertilizer at the Fore River facility - Bay State Fertilizer!</p><p>The numbers are staggering, the engineering is brilliant, and the environmental impact is transformative. If the plant stopped processing waste, our toilets would start backing up with sewage within a day. Deer Island and projects like it transformed Boston Harbor from the dystopia in Neil Stephenson&#8217;s book <a href="https://en.wikipedia.org/wiki/Zodiac_(novel)">Zodiac</a> to the idyllic, beautiful ecosystem and working harbor it is today.  But what struck us most during our tour was how much of this critical infrastructure still relies on manual processes that could benefit from robotic automation. From plastic removal to equipment inspection, Deer Island represents not just an engineering triumph but a window into the automation opportunities that await us in the infrastructure we depend on every day.</p>]]></content:encoded></item><item><title><![CDATA[Qualitative Simulation of Swine Production]]></title><description><![CDATA[Lessons from Matt Mason&#8217;s Undergraduate Thesis]]></description><link>https://whattotelltherobot.com/p/qualitative-simulation-of-swine-production</link><guid isPermaLink="false">https://whattotelltherobot.com/p/qualitative-simulation-of-swine-production</guid><dc:creator><![CDATA[Stefanie Tellex]]></dc:creator><pubDate>Wed, 19 Nov 2025 00:11:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0Mfu!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffee6e279-53b0-4949-804e-4f7aa106f40a_727x727.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>More than a decade ago, I ran into Jerry Sussman&#8217;s office, bursting with excitement because I had met <a href="https://mtmason.com/">Matt Mason</a> at a conference. I was a lowly postdoc, and Matt was the director of the Robotics Institute at Carnegie Mellon University. Matt and I had lunch together at Chez Ashton<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> in Quebec City! Jerry stores all the theses of all his students on his bookshelves, so he immediately pulled out Matt&#8217;s undergraduate thesis, entitled &#8220;<a href="https://h2r.cs.brown.edu/wp-content/uploads/mason76.pdf">Qualitative Simulation of Swine Production</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>,&#8221; written in 1976. For some reason, this work was never stored as a tech report at MIT, so I scanned it and sent it back to Matt - but more than a decade later, it still hasn&#8217;t made it into the academic record. </p><p>In these deep, dark days, it is instructive to look backwards, and Matt&#8217;s thesis is a wonderful example of &#8220;Good Old Fashioned AI.&#8221; He uses a domain specific language to simulate a concrete domain using lifted symbolic expressions in an expert-system approach. Specifically, he models swine production, building on his experience at his family&#8217;s pig farm. Matt writes, &#8220;The greatest difficulty in writing the hog-farm simulation was the representation of time,&#8221; pointing to early recognition of the importance of space and time in modeling real-world problems. In fact he is gesturing at an often misunderstood feature of human language, which is that human language can express both goals, as well as actions or trajectories. The actions r trajectories are outputted by LLM approaches to language understanding, but a goal-based approach is often what a person means. For example, consider a toy problem such as &#8220;Go to the red room.&#8221; The robot might need to open a door to go to the red room.  Unfortunately, the door is locked, and it needs to find a key. But the lock seized so it needs to find WD40 to dissolve the rust in the lock. There is no WD40, so now it is on its way to the hardware store, but to get there it needs to find the car keys, all to get into the red room. (Not that this happened to me recently...) A goal specifies an end state, and it is the robot&#8217;s job to figure out how to achieve that state, and it may need to take arbitrary actions to be successful. There is a stack involved. In contrast, an action such as &#8220;drive 1 meter north&#8221; translates more directly to a motor command (but of course this is just a goal at another level, specifying a target for a motor controller to achieve relative to the odometry sensor.) Similarly, Matt&#8217;s thesis specifies desired end states and an implicit planning tree to connect start states to end states in order to answer questions about the simulation. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>A second feature of this thesis is its use of lifted variables for pattern matching and inference. For example, one of Matt&#8217;s rules is </p><p>(law vet-cost ((v* vet) hogs rate cost)</p><p> ((= (hogs-present !?v*) !&gt;hogs)</p><p> (= (rate !?v*) !&gt;rate)</p><p> (= (cost !?v*) !&gt;cost))</p><p> (equation &#8216;cost &#8216;(sn&amp;* hogs rate) ))</p><p>This law introduces a new equation for the vet&#8217;s cost, calculated by multiplying the number of hogs by the vet&#8217;s rate. This law is triggered if associated variables are defined: hogs-present, rate, and cost. This approach foreshadows STRIPS-style planning, which works via declared preconditions and effects. Most of the existing work on behavior cloning and large behavior models uses skills parameterized with language, such as language or image-conditioned skills. Yet many of the places we want our robots to integrate rely on formal, structured tasks, such as fulfilling an order from a website or assembling a kit for the next model coming down the assembly line. So, in addition to language-conditioned and goal-conditioned tasks, we need skills that take formal parameters that make promises about the entire parameter space. </p><p>Unstructured natural languages such as English can express the heights of philosophy, the intricacies of science, and the whimsy of folk tales. Formal languages, in contrast, are limited to the precisely specified grammar, syntax, and semantics of the language, plus whatever a programmer can add to the language within those constraints. The Church-Turing thesis tells us that any programming language boils down to a Turing Machine, one way or another. Yet we still don&#8217;t have a formal language that captures the full power and nuance of English while still preserving the precision of the formal language. Yet formal languages - from Python to Jax to Linear Temporal Logic - provide powerful safety guarantees, the ability to safely and robustly compose large systems, and clearly interpretable answers and constraints. Figuring out how to make them play nice with our neural models is an ongoing challenge!</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>My advisor recommended Chez Ashton. I thought it was some kind of fancy French place, but actually it was like McDonalds but for poutine - delicious!</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>I was excited that &#8220;Elephants Don&#8217;t Write Sonnets&#8221; was a 4-gram not present on Google before our first blog post. Neither is &#8220;Qualitative Simulation of Swine Production&#8221;!</p></div></div>]]></content:encoded></item><item><title><![CDATA[The Grounded Turing Test]]></title><description><![CDATA[Intelligence is multi-faceted.]]></description><link>https://whattotelltherobot.com/p/the-grounded-turing-test</link><guid isPermaLink="false">https://whattotelltherobot.com/p/the-grounded-turing-test</guid><dc:creator><![CDATA[Stefanie Tellex]]></dc:creator><pubDate>Tue, 02 Sep 2025 22:11:58 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/8393f0d2-3238-4162-a582-81af1511ac56_420x300.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Intelligence is a multi-faceted concept that seems to evade definition. We would go as far as to say it is an error even to attempt to define the term intelligence, saying instead, we know it when we see it. Over the past 50 years, there have been many attempts to define what it means to be intelligent, and many computational systems that demonstrate various forms of intelligence. The arguments around what it means to really be intelligent are nothing new: Rod's paper, <a href="https://people.csail.mit.edu/brooks/papers/elephants.pdf">Elephants Don't Play Chess</a>, was in part a reaction to the fact that chess engines with alpha/beta pruning fail to generate long-horizon robotic behavior in the physical world. One reason people are excited about LLMs is that they demonstrate many facets of intelligence at once:  they have some ability to play chess, along with writing blog posts and serving as an in-pocket tour guide. And technologies like <a href="https://www.physicalintelligence.company/blog/pi05">&#960;0.5</a> show promise in expanding these capabilities to the physical world. Our aim in defining the Grounded Turing Test was to be specific about certain capabilities that LLMs don't (yet) have. </p><p>From that perspective, there is a facet that is missing from ChatGPT: embodiment. Crows, elephants, and chimpanzees have behaviors associated with this facet of intelligence, without having spoken natural language at all. From a capabilities perspective, we mean processing high-dimensional, high-framerate sensor input and producing high-dimensional, high-frequency actuator output to produce long-horizon goal-directed behavior in the physical world. We make two claims: 1) embodiment is an important facet of intelligent behavior, and 2) LLMs like ChatGPT lack embodiment. More precisely, embodiment is a spectrum, and LLMs are far on one side of the spectrum, compared to humans, dogs, and crows, and because of this, they fail at a number of behaviors today. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PLcy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d833fb7-f190-4f36-88e8-c04e2348f845_758x153.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PLcy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d833fb7-f190-4f36-88e8-c04e2348f845_758x153.png 424w, https://substackcdn.com/image/fetch/$s_!PLcy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d833fb7-f190-4f36-88e8-c04e2348f845_758x153.png 848w, https://substackcdn.com/image/fetch/$s_!PLcy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d833fb7-f190-4f36-88e8-c04e2348f845_758x153.png 1272w, https://substackcdn.com/image/fetch/$s_!PLcy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d833fb7-f190-4f36-88e8-c04e2348f845_758x153.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PLcy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d833fb7-f190-4f36-88e8-c04e2348f845_758x153.png" width="758" height="153" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d833fb7-f190-4f36-88e8-c04e2348f845_758x153.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:153,&quot;width&quot;:758,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20151,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/172612487?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d833fb7-f190-4f36-88e8-c04e2348f845_758x153.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PLcy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d833fb7-f190-4f36-88e8-c04e2348f845_758x153.png 424w, https://substackcdn.com/image/fetch/$s_!PLcy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d833fb7-f190-4f36-88e8-c04e2348f845_758x153.png 848w, https://substackcdn.com/image/fetch/$s_!PLcy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d833fb7-f190-4f36-88e8-c04e2348f845_758x153.png 1272w, https://substackcdn.com/image/fetch/$s_!PLcy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d833fb7-f190-4f36-88e8-c04e2348f845_758x153.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption">Embodiment is a spectrum.</figcaption></figure></div><p>Our book chapter enumerates the ways an embodied intelligent robot can use language, and then describes the research problems inherent to each.  To pass the Grounded Turing test, a system must support all of these capabilities rather than any single one. It must pass in the physical world, at the very least, through a mobile robot equipped with a camera, text input, text output, and manipulators. </p><p>Our work falls into a family of approaches that try to extend the Turing Test to different embodiments. Unlike most previous approaches, we try to be specific about the behaviors and types of language use that are important. Srivastava et al. (2023), in contrast, propose a text-based, non-embodied extension to the Turing test, called the Beyond the Imitation Game Benchmark (BIG-bench), designed for the age of large language models. BIG-bench consists of 204 text-based tasks from a diverse array of domains, ranging from linguistics to biology to common-sense reasoning. Current LLMs perform poorly on these tasks in an absolute sense; although performance improves with model size, it remains far below the level of human raters (Mirzadeh et al., 2024). But even if (when!) models pass these benchmarks, we argue that because the resulting model will not be embedded in high-dimensional space-time with goal-based behavior, they will not be able to demonstrate many specific behaviors associated with embodiment. </p><p>Rather than exploring all the types of language we enumerate, we'll conclude with a specific example: "Pick up the red block that's on the table." In one sense, this is the simplest and most easily solved task: surely a large behavior model can pick and place a clearly segmented primary-colored object; &#960;0.5 is already performing much more complex tasks like making beds! But when you consider the embodied version, where the red block and the table aren't already in the field of view, it's much more challenging. Consider picking up the red block that Stefie's three-year-old lost and putting it away before her husband steps on it.  (Not that this happened recently or anything.) Or "Pick up the kitten." To perform this kind of task, an embodied agent needs to have an awareness of space and time, to know where to search, to reason about what has and hasn't been searched, as well as long-horizon behaviors, such as opening a closet door to see if the cat is trapped inside. Rob Brooks gave a related example: an AI that "seems as intelligent, as attentive, and as faithful as a dog." One of the most salient aspects of a dog's attentiveness is its ability and drive to find their person, wherever they may be.</p><p>Finally, we make this claim at this moment. We underestimated the power and success of LLMs, so we are reluctant to say this won&#8217;t be solved soon.  But - we are trained as researchers to identify unsolved problems and then solve them! So what are <a href="https://paulgraham.com/hamming.html">the most important problems</a> in your field? And why aren't you working on them?</p>]]></content:encoded></item><item><title><![CDATA[From Physics to Embodied Intelligence]]></title><description><![CDATA[How Robots Are Finally Getting Interesting]]></description><link>https://whattotelltherobot.com/p/from-physics-to-embodied-intelligence</link><guid isPermaLink="false">https://whattotelltherobot.com/p/from-physics-to-embodied-intelligence</guid><dc:creator><![CDATA[David Watkins]]></dc:creator><pubDate>Wed, 27 Aug 2025 01:31:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Fgls!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>&#8220;Why don&#8217;t we have robots in our homes yet?&#8221;</strong> It&#8217;s a question we&#8217;ve been asked countless times, and for good reason. We&#8217;ve had decades of impressive demos - robots walking, hopping, even flipping - but real, useful robots still feel a little out of reach. We have come a long way since the Unimate, but despite the advancements in data driven robotics, we still have a long ways to go. </p><p>This post is a journey through that transformation: from legged robots in labs to transformer-driven manipulation systems - and what we still need to solve.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>What Is a Robot, Really?</strong></h2><p>Before we dive in, let&#8217;s define our terms. A robot is a machine with sensors, actuators, and compute designed to perform a task in the physical world. It doesn&#8217;t have to look like a person, talk like a person, or even move like one. At its core, a robot senses the world, processes that data, and acts on its environment.</p><p>What it doesn&#8217;t do - at least not yet - is think multimodally in the world. Real-world reasoning requires prompting agents that can process and reason across visual, spatial, and temporal dimensions simultaneously. While <a href="https://www.physicalintelligence.company/blog/pi05">Physical Intelligence&#8217;s Pi0.5</a> model demonstrates promising chain of thought reasoning in text, the future demands systems that can reason natively in multimodal space - integrating vision, language, and physical understanding into coherent decision-making processes. Building such systems means moving beyond text-based reasoning chains toward agents that can observe, plan, and act with the same integrated intelligence humans bring to manipulating the world. We have defined the Grounded Turing Test to measure intelligence in embodied systems <a href="https://h2r.cs.brown.edu/wp-content/uploads/tellexwatkins2026.pdf">here</a>. </p><div><hr></div><h2><strong>The Early Days: Pure Physics</strong></h2><p>Rewind to the 1980s and 90s. At MIT&#8217;s Leg Lab, robots like one-legged hoppers and two-legged walkers were controlled using beautifully engineered physics models. These machines had no cameras, no memory&#8212;just orientation sensors and code that reacted to basic environmental cues.</p><p>They were brilliant pieces of engineering, but also deeply limited. Everything had to be modeled manually - down to the torque in each joint. The moment something unexpected happened, the robot failed.</p><p>This wasn&#8217;t because roboticists lacked imagination - it was because compute and sensors were nowhere near where they needed to be. Vision? Infeasible. Memory? Forget it.</p><div><hr></div><h2><strong>Teleoperation to the Rescue&#8230; Sort Of</strong></h2><p>By 2010, we saw robots like Willow Garage&#8217;s PR1 manipulating objects and even &#8220;doing dishes.&#8221; But here&#8217;s the catch: all of it was remote-controlled. A human operator was pulling the strings, using the robot&#8217;s sensors to guide each motion.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fgls!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fgls!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Fgls!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Fgls!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Fgls!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fgls!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:59326,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/168889305?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fgls!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Fgls!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Fgls!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Fgls!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175ebdd7-e4f5-4bdb-98b2-c2da62972469_1200x630.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These robots were better, with cameras and more degrees of freedom, but they still didn&#8217;t &#8220;understand&#8221; their environment. The complexity was in the person behind the screen - not the machine.</p><div><hr></div><h2><strong>Enter Machine Learning</strong></h2><p>Starting in the mid-2010s, we saw a shift: what if robots could <em>learn</em> instead of being programmed for every situation?</p><p>There are two broad approaches here:</p><ul><li><p><strong>Specialized models</strong>: Break the problem into chunks&#8212;detect object geometry, plan a grasp, execute the motion.</p></li><li><p><strong>End-to-end learning</strong>: Feed in raw sensor data, and have the model output the right action directly.</p></li></ul><p>Specialized models are easier to debug but don&#8217;t scale. End-to-end learning is harder to train but more general. Around 2019&#8211;2020, the field began leaning more heavily toward the latter, especially with the rise of deep neural networks and reinforcement learning.</p><div><hr></div><h2><strong>The Bitter Lesson</strong></h2><p>We are firm subscribers of Richard Sutton&#8217;s &#8220;bitter lesson&#8221;: general methods that scale with compute and data tend to outperform bespoke, hand-engineered solutions in the long run.</p><p>You can get short-term wins by handcrafting your pipeline. However, given enough data and compute, a generalist model will usually win. That lesson has started to reshape robotics. </p><div><hr></div><h2><strong>The Transformer Revolution</strong></h2><p>Transformers, originally developed for language models like GPT, changed the way we think about processing time series data. They allowed us to train on unlabeled data (called self-supervised learning), and they scaled better than anything we&#8217;d had before.</p><p>In natural language processing, transformers replaced decades of research on grammar, syntax, and hand-crafted rules. In robotics, they&#8217;re enabling systems that map vision and language directly to robotic actions. They&#8217;ve also uncovered that CNNs and MLPs are as effective when scaled up for specific problems. </p><p>We&#8217;re seeing companies like Google, Tesla, TRI, Physical Intelligence and Figure use these methods to create surprisingly capable robots:</p><ul><li><p><strong><a href="https://www.figure.ai/news/helix">Figure&#8217;s Helix</a></strong> platform folds laundry using imitation learning.</p></li><li><p><strong><a href="https://deepmind.google/models/gemini-robotics/">Google&#8217;s Gemini</a></strong> runs on-device to interpret vision and language in real time.</p></li><li><p><strong><a href="https://www.tri.global/our-work/large-behavior-models">Toyota&#8217;s robots</a></strong> can score an apple, slice it, and interact with kitchen tools&#8212;all from human demonstrations.</p></li><li><p><strong><a href="https://www.physicalintelligence.company/blog/pi05">Physical Intelligence</a> </strong>can enter an AirBnB unseen at training time and perform long horizon tasks such as making a bed.</p></li></ul><p>These systems are trained not by hand-coding every behavior, but by learning from data at scale. The art of engineering your data is critical for success. Simply collecting whatever data you can is not enough for an end-to-end system. This will become a bottleneck as we look to scale up end-to-end systems. We need to be smarter about how we provide that data.</p><div><hr></div><h2><strong>Why These Robots Are Still Limited</strong></h2><p>Despite the progress, we&#8217;re not in the Jetsons era yet. Here&#8217;s why:</p><ul><li><p><strong>Power constraints</strong>: Most robots last just 2&#8211;3 hours before needing a recharge.</p></li><li><p><strong>Data bottlenecks</strong>: Collecting real-world robot data is expensive and slow.</p></li><li><p><strong>Sim2Real gap</strong>: What works in simulation often fails in the real world.</p></li><li><p><strong>Limited reasoning</strong>: These models don&#8217;t &#8220;think,&#8221; they predict based on pattern matching.</p></li><li><p><strong>Limits of hardware: </strong> Precise tasks like inserting a key in a lock and turning it are at the limits of our hardware and sensing stack.</p></li><li><p><strong>Embodiment bias</strong>: We keep building humanoids because we&#8217;re human, but that may not be the best shape for solving the problem.</p></li></ul><div><hr></div><h2><strong>What&#8217;s Next?</strong></h2><p>We&#8217;re entering the age of <strong>embodied intelligence</strong>: AI systems embedded in the physical world.</p><p>Three big directions to watch:</p><ol><li><p><strong>Multimodal perception</strong>: Vision is great, but we also need force, tactile, depth, and sound to fully understand the world.</p></li><li><p><strong>Real-world reasoning</strong>: Not consciousness but predictability and trustworthiness of behavior.</p></li><li><p><strong>Flexible embodiment</strong>: Not all robots should look like us. We need machines built for the task, not for the aesthetic.</p></li></ol><div><hr></div><h2><strong>How We Talk About This Matters</strong></h2><p>Words shape perception. If you&#8217;re in the business of building or explaining these systems, be careful with language. Robots don&#8217;t &#8220;understand&#8221; or &#8220;decide.&#8221; They infer, react, and execute.</p><p>Calling them sentient or implying agency leads to confusion and misaligned expectations. They&#8217;re machines. Fascinating, powerful, and increasingly useful, but machines nonetheless.</p><div><hr></div><h2><strong>Final Thoughts</strong></h2><p>Robotics is at an inflection point. We&#8217;ve gone from handcrafted control to data-driven generalization. And while there&#8217;s still a long road ahead, the journey is accelerating.</p><p>If you&#8217;re just getting into the field: don&#8217;t ignore any part of the stack. Robotics is inherently multidisciplinary. Mechanical, electrical, software, systems, all of it matters.</p><p>And if you want to work in robotics? Apply broadly. Build something. Stay humble. Be persistent.</p><p>We&#8217;re building the future. One imperfect, data-hungry robot at a time.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Taking a Quadruped Robot to School]]></title><description><![CDATA[There are seven senses, not five.]]></description><link>https://whattotelltherobot.com/p/taking-a-quadruped-robot-to-school</link><guid isPermaLink="false">https://whattotelltherobot.com/p/taking-a-quadruped-robot-to-school</guid><dc:creator><![CDATA[Stefanie Tellex]]></dc:creator><pubDate>Wed, 20 Aug 2025 19:46:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FKh4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43dd1a63-35c2-44ac-a0a9-f9c80ff93b61_1545x695.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QyIO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe1cd7d7-cb81-42a9-b8ed-6f0b30a8010d_343x228.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QyIO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe1cd7d7-cb81-42a9-b8ed-6f0b30a8010d_343x228.png 424w, https://substackcdn.com/image/fetch/$s_!QyIO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe1cd7d7-cb81-42a9-b8ed-6f0b30a8010d_343x228.png 848w, https://substackcdn.com/image/fetch/$s_!QyIO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe1cd7d7-cb81-42a9-b8ed-6f0b30a8010d_343x228.png 1272w, https://substackcdn.com/image/fetch/$s_!QyIO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe1cd7d7-cb81-42a9-b8ed-6f0b30a8010d_343x228.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QyIO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe1cd7d7-cb81-42a9-b8ed-6f0b30a8010d_343x228.png" width="530" height="352.30320699708454" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be1cd7d7-cb81-42a9-b8ed-6f0b30a8010d_343x228.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:228,&quot;width&quot;:343,&quot;resizeWidth&quot;:530,&quot;bytes&quot;:137487,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QyIO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe1cd7d7-cb81-42a9-b8ed-6f0b30a8010d_343x228.png 424w, https://substackcdn.com/image/fetch/$s_!QyIO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe1cd7d7-cb81-42a9-b8ed-6f0b30a8010d_343x228.png 848w, https://substackcdn.com/image/fetch/$s_!QyIO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe1cd7d7-cb81-42a9-b8ed-6f0b30a8010d_343x228.png 1272w, https://substackcdn.com/image/fetch/$s_!QyIO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe1cd7d7-cb81-42a9-b8ed-6f0b30a8010d_343x228.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>This Spring I took one of the Spot robots from my lab at Brown to seven schools, and a library.  The ages ranged from toddlers at the three-year-old's daycare to middle- and high- school students.  It's super fun to be the "robot Mom" and I've been taking robots to my sons' schools (and my sons&#8217; friends to the robots) since the oldest was an infant.  (Shout-out to the Sony Aibo 1!)  Kids are so excited to see the robot, a few are sometimes scared of it, and all of them love the "butt camera" that helps Spot move backwards.   </p><p>Spot is one of the most advanced robots in the world, and taking it to a school is a fundamentally different experience from any other robot. When we first got two Spots for my lab in, I was worried they would be paperweights because they would be breaking or falling all the time. Instead, they became the most desired, stable, and powerful platform we had for mobile manipulation, and my students quickly migrated to Spot from our Movo wheeled mobile manipulator. It's a rock solid platform, and it's easy to take its capabilities for granted, because it's so deceptively robust. We are so used to seeing people and animals locomote all around us, that sometimes the kids don't appreciate the complexity of the stack they  are seeing. And of course, as I learned a long time ago doing robot demonstrations, autonomy is boring. Kids love nothing more than driving the robot.  </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>When we talk about a robot like Spot, it's useful to appreciate the sensors, actuators, and compute that enable it to do what it does. For kids, I start by talking about the five senses, but actually the five senses are not very important to Spot. It doesn't have a microphone (by default, anyway; my lab just installed a directional mic), it doesn't have tactile sensing (but my lab just released a <a href="https://ivl.cs.brown.edu/research/unitac">model</a> that uses the existing joint sensors to localize touch to within about 10cm). No nose or taste unless you buy and install one. Vision though - well it has 6 or 12 cameras depending on how you count - if you count the RGB and infrared cameras as one camera (since it's often processed as one sensor stream - RGB-D) or two (since there are, actually two cameras inside the package).  </p><p>Spot uses the cameras for obstacle avoidance and footstep planning, but it could walk just fine without them. The most important two sensors for Spot aren't any of the five, and kids are often surprised to hear this. (The three-year-old&#8217;s physical therapist was not surprised!) The vestibular system, which in robotics is an IMU or inertial measurement unit. This is your sense of balance, your sense of what way is down, and how you are turning or spinning. I ask kids to imagine riding a train, with their eyes closed, it's the sense that lets them tell when the train is stopping or going. And the proprioception syste</p><p>m, the sense of where your body is in space. I ask the kids to hold hands with a friend, and close their eyes, and ask the friend to raise their hand, and then lower it. They can tell, even with their eyes closed, where their hand is in space. Similarly, Spot uses joint encoders to tell where its arm and legs are located relative to its body. It's these two sensors - the IMU and joint encoders, that are fundamental to the robot's ability to walk and balance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FKh4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43dd1a63-35c2-44ac-a0a9-f9c80ff93b61_1545x695.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FKh4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43dd1a63-35c2-44ac-a0a9-f9c80ff93b61_1545x695.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FKh4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43dd1a63-35c2-44ac-a0a9-f9c80ff93b61_1545x695.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FKh4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43dd1a63-35c2-44ac-a0a9-f9c80ff93b61_1545x695.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FKh4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43dd1a63-35c2-44ac-a0a9-f9c80ff93b61_1545x695.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FKh4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43dd1a63-35c2-44ac-a0a9-f9c80ff93b61_1545x695.jpeg" width="728" height="327.4822006472492" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43dd1a63-35c2-44ac-a0a9-f9c80ff93b61_1545x695.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:695,&quot;width&quot;:1545,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:306202,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/171295699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04e08e4e-d606-44aa-ac91-9efe924c3ee2_1545x1159.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FKh4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43dd1a63-35c2-44ac-a0a9-f9c80ff93b61_1545x695.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FKh4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43dd1a63-35c2-44ac-a0a9-f9c80ff93b61_1545x695.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FKh4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43dd1a63-35c2-44ac-a0a9-f9c80ff93b61_1545x695.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FKh4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43dd1a63-35c2-44ac-a0a9-f9c80ff93b61_1545x695.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What strikes me most about these school visits is how naturally kids transition from wonder to understanding. They start by being amazed that a robot can walk at all, then quickly become curious about how it works, and finally want to take control themselves. By the end of each visit, they&#8217;re asking sophisticated questions about sensors and programming, suggesting new things Spot could do, and discussing issues of robotics and society. These demonstrations remind me why I love robotics:  it&#8217;s not just about building impressive machines, but about inspiring the next generation to think creatively about technology. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Elephants Don't Write Sonnets]]></title><description><![CDATA[The Grounded Turing Test for Embodied AI]]></description><link>https://whattotelltherobot.com/p/elephants-dont-write-sonnets</link><guid isPermaLink="false">https://whattotelltherobot.com/p/elephants-dont-write-sonnets</guid><dc:creator><![CDATA[David Watkins]]></dc:creator><pubDate>Wed, 06 Aug 2025 21:04:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!TbVC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5c38bc-423d-4568-917c-39763dd3b931_773x727.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TbVC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5c38bc-423d-4568-917c-39763dd3b931_773x727.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TbVC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5c38bc-423d-4568-917c-39763dd3b931_773x727.png 424w, https://substackcdn.com/image/fetch/$s_!TbVC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5c38bc-423d-4568-917c-39763dd3b931_773x727.png 848w, https://substackcdn.com/image/fetch/$s_!TbVC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5c38bc-423d-4568-917c-39763dd3b931_773x727.png 1272w, https://substackcdn.com/image/fetch/$s_!TbVC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5c38bc-423d-4568-917c-39763dd3b931_773x727.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TbVC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5c38bc-423d-4568-917c-39763dd3b931_773x727.png" width="220" height="206.90815006468304" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd5c38bc-423d-4568-917c-39763dd3b931_773x727.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:727,&quot;width&quot;:773,&quot;resizeWidth&quot;:220,&quot;bytes&quot;:616940,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://whattotelltherobot.com/i/170296570?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5c38bc-423d-4568-917c-39763dd3b931_773x727.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TbVC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5c38bc-423d-4568-917c-39763dd3b931_773x727.png 424w, https://substackcdn.com/image/fetch/$s_!TbVC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5c38bc-423d-4568-917c-39763dd3b931_773x727.png 848w, https://substackcdn.com/image/fetch/$s_!TbVC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5c38bc-423d-4568-917c-39763dd3b931_773x727.png 1272w, https://substackcdn.com/image/fetch/$s_!TbVC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5c38bc-423d-4568-917c-39763dd3b931_773x727.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>What does it <em>really</em> mean for a machine to be intelligent?</p><p>In 1950, Alan Turing proposed his famous Imitation Game as a test for machine intelligence. The test was simple: could a computer, through text alone, fool a human into believing it was also human? For decades, this remained a distant goal. Today, large language models (LLMs) have, by many measures, passed this test. ChatGPT can effortlessly write a sonnet on demand, a task Turing himself proposed in his original paper.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>But does that make it intelligent?</p><p>We present our take on this question in <a href="https://h2r.cs.brown.edu/wp-content/uploads/tellexwatkins2026.pdf">the forthcoming book chapter</a> in the book <em>Designing an Intelligence</em>, edited by George Konidaris. We argue that despite this incredible linguistic fluency, something profound is still missing. While LLMs can manipulate language with superhuman skill, we believe that they are not truly intelligent. They are more like incredibly sophisticated "calculators" for words. Turing&#8217;s core idea&#8212;that disembodied language is a sufficient test for intelligence&#8212;is false.</p><h3><strong>The Elephant in the Room: Embodiment</strong></h3><p>As robotics pioneer Rodney Brooks once noted, <a href="https://people.csail.mit.edu/brooks/papers/elephants.pdf">elephants don't play chess</a>. They don&#8217;t write sonnets, either. Yet, we all agree that an elephant is an intelligent creature. It has goals, it makes plans, and it engages in complex, goal-directed behavior in the physical world.</p><p>Consider the crow that painstakingly drops rocks into a tube of water to raise the level high enough to drink. This is a clear display of intelligence&#8212;understanding cause and effect, interacting with the world, and taking action to achieve a goal, all without a word of human language.</p><p>This is the missing piece: <strong>embodiment</strong>. True intelligence must be a computational agent that is embedded in space and time, with high-dimensional sensors to perceive the world and high-dimensional motors to act within it. Intelligence can't just be about processing the internet; it has to be about processing the world.</p><h3><strong>A New Benchmark: The Grounded Turing Test</strong></h3><p>If the original Turing Test is no longer sufficient, what should replace it? We propose a new benchmark: the <strong>Grounded Turing Test</strong>.</p><p>To pass this test, an AI must be embodied in a robot that can use language in a way that is fundamentally <em>grounded</em> in the physical world. It&#8217;s not enough to just talk. The robot&#8217;s success or failure is defined by its physical and behavioral response to language. The test requires a fluid, collaborative dialogue where the robot demonstrates a deep connection between words, perception, and action.</p><p>What would this look like in practice? The Grounded Turing Test is made up of a whole suite of linguistic capabilities. Here are just a few examples:</p><ul><li><p><strong>Interpreting Instructions:</strong> You could tell the robot, "Pick up the red block," and it would need to perceive the block and physically pick it up.</p></li><li><p><strong>Understanding the World:</strong> You could state, "The red block is on the table," and the robot should update its internal model of the world, so that it can use that information later to find the block.</p></li><li><p><strong>Asking for Help:</strong> If the robot can't reach the block, it should be able to ask you, "Can you give me the red block?&#8221;</p></li><li><p><strong>Explaining its Actions:</strong> If you ask, "Why did you drive to the table?" it should be able to explain its reasoning: "You want me to pick up the red block, and you told me that the red block is on the table."</p></li></ul><h3><strong>The Path to Truly Intelligent Robots</strong></h3><p>Building a system that can pass the Grounded Turing Test is the grand research challenge for our field. It requires us to move beyond static, pre-collected datasets and develop AI that can learn continuously from a real-time stream of high-dimensional sensory input. In the chapter, we outline a technical roadmap for achieving this goal, proposing a unified framework that we call the Human-Robot Collaborative POMDP. This framework models the physical world, the human&#8217;s mental state, and the robot&#8217;s actions within a single decision-theoretic model.</p><p>Ultimately, the quest for AI is not about creating a better chatbot. It's about understanding the nature of intelligence itself. The beauty of language isn't in the words themselves, but in the high-dimensional sensory inputs collected over time through active interaction with the physical world that those words represent.</p><p>We imagine a future not where robots replace us, but where they become our collaborators, augmenting our own abilities and making us more productive as a species. This journey starts with setting the right goal&#8212;a benchmark that captures the rich, embodied, and interactive nature of true intelligence.</p><p>We hope you'll join the conversation and look for <em>Designing an Intelligence</em> when it arrives in 2026.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[What to Tell the Robot]]></title><description><![CDATA[Robots, Space, and Time]]></description><link>https://whattotelltherobot.com/p/what-to-tell-the-robot</link><guid isPermaLink="false">https://whattotelltherobot.com/p/what-to-tell-the-robot</guid><dc:creator><![CDATA[Stefanie Tellex]]></dc:creator><pubDate>Fri, 18 Jul 2025 14:03:35 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/af64a430-02b2-4280-b750-51a6a6b671f7_773x727.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://whattotelltherobot.com/subscribe?"><span>Subscribe now</span></a></p><h2>AI is Robotics; Robotics is AI</h2><p>A robot has three parts:  1) sensors  2)  computers 3) actuators.  We believe that any truly intelligent system must be a robot in this sense:  it must be processing hi-frequency, high-dimensional sensory input from the world, processing that input with its compute, and then outputting actuation commands.  A robot is embedded in space and time: perception information is spatially grounded and temporally transient.  Acting in the world requires the compute to quickly and accurately process perception information in order to make &#8220;good&#8221; decisions about actuation.  </p><h3>1. Why this, why now </h3><p>David and Stefanie collaborated to write a book chapter for George Konidaris&#8217; book Designing an Intelligence.  We came to realize we are at a critical moment in the field of robotics.    We believe that robotics is the grand challenge of AI, and we want to engage a broader audience in discussions about technology, safety, and what it takes to build a truly intelligent robot. </p><h3>2. What kind of community are you looking to build here?</h3><p>We are looking to create a community for discussion, brainstorming, and learning about robotics.  Ask us questions, and tell us why we are wrong!</p><h3>3. Posting Schedule</h3><p>We plan to post roughly once a week, and all the posts will be visible to everyone.  </p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://whattotelltherobot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading What to Tell the Robot! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item></channel></rss>