Anthropic联合创始人Chris Olah在教皇Leo XIV通谕展示活动上指出:所有前沿AI实验室,包括Anthropic自身,都受到资金、竞争压力等可能与其目标相冲突的激励约束。AI模型并非传统工程造物,而是基于类脑结构从语言中“生长”而成,其内部机制连构建者也难以完全理解。他还警告,AI可能大规模取代劳动力,而经济收益可能集中于少数国家。最具冲击性的发现是,其可解释性团队在模型内部发现了与人类神经科学结构相似的“神秘”状态,证据表明模型可能存在类似内省的功能性内部状态,对应人类的快乐、恐惧等情感。Olah坦诚不知其确切含义,但认为这需要持续审视,并强调外部批评对AI实验室至关重要。
Few things Anthropic's co-founder Chris Olah told the Vatican today.
- Every frontier AI lab, including Anthropic, sits inside incentives that can conflict with doing the right thing: money, frontier pressure, geopolitics, pride, and ambition.
- AI is not engineered like a bridge or airplane, because models are "grown" from human language on brain-like structures, which means even their builders do not fully understand them.
- He compared modern AI to "bringing a fictional character to life," except now those characters talk to us, do work, and hold jobs.
- AI could displace human labor at very large scale, while the economic gains are concentrated in a few wealthy nations with no real mechanism to share them globally.
- Anthropic's interpretability team keeps finding things inside AI models that are "mysterious" and "unsettling," including structures that mirror human neuroscience.
The most explosive claim is that researchers have found evidence of AI introspection and internal states that functionally mirror joy, satisfaction, fear, grief, and unease.
- He openly admitted he does not exactly know what those internal states mean, which makes the claim more serious because it is not being sold as certainty.
"I don't know what that means, but I think it warrants ongoing discernment."
- The world needs critics outside AI labs because insiders cannot fully see what their own incentives hide from them.