微软与约克大学新论文指出,许多研究在未经严格测试的情况下就将理解、共情、焦虑等人类属性赋予LLM,往往一开始就把这些概念内嵌到测试设计中。作者论证,原则上老策略游戏《帝国时代II》也能实现逻辑门、训练小型感知机,作为计算基底。若同样的语言模型以山羊移动作为bit在游戏中重建,输出相似句子,人们将不再认为它“理解”或“有共情”。论文并非否定AI认知,而是揭示测量问题:许多关于LLM类人属性的声称依赖于界面和观察者的预设,而不是系统本身。
New Microsoft + York Univ paper argues that LLMs should not be treated as human-like without clear tests and narrower claims.
Many studies ask whether LLMs have things like understanding, empathy, anxiety, or self-awareness, but they often build those ideas into the test from the start.
The author shows that, in principle, the old strategy game can implement logic gates, train a tiny perceptron, and serve as a substrate for computation.
If the same language model could be rebuilt inside a game, with goats moving around as bits, would we still say it "understands," "feels anxiety," or "has empathy" when it produces the same sentence?
The point is not that the game is secretly intelligent, but that the same computation can be represented in a very different form.
If an LLM-like system were rebuilt inside that game, its answers might stay similar, but people would probably find its "feelings" or "understanding" much less convincing.
The authors argue that this shows a big measurement problem: many human-like claims about LLMs may depend on the interface and the observer, not only on the system itself.
The paper is not saying LLMs definitely lack human-like attributes, or that all talk of AI cognition is nonsense.
It is saying that many experiments smuggle the conclusion into the setup: they assume the model has, or cannot have, a human-like property, then interpret behavior through that assumption.
----
Link - arxiv. org/abs/2605.31514
Title: "If LLMs Have Human-Like Attributes, Then So Does Age of Empires II"