# 工具使用代理认知与行动脱节机制研究

- 来源：elvis (@omarsar0)
- 发布时间：2026-05-17 04:40
- AIHOT 分数：71
- AIHOT 链接：https://aihot.virxact.com/items/cmp8u9kfs0lkpslnzy51k1dhj
- 原文链接：https://x.com/omarsar0/status/2055750162526715926

## AI 摘要

该可解释性论文聚焦工具使用代理，通过探测隐藏状态发现模型常能识别应调用工具，但实际调用失败，不匹配率达26%-54%。问题完全集中于认知到行动的过渡阶段，而非认知本身。内部探测方向可解码，但后期层的最后令牌机制使信号旋转，几乎与产生的行动正交。研究旨在预测干预措施效果，指出常见归因如提示或训练不足可能忽略后期层几何结构，这为工具使用提示A/B测试中的性能上限提供了合理解释。

## 正文

Interesting interpretability paper on tool-using agents.

The authors probe hidden states and find the model often recognizes it should call a tool， but fails to actually call one. The mismatch ranges from 26 to 54%， and it concentrates entirely in the cognition-to-action transition， not in cognition itself.

In other words， the model usually knows it should call the tool.

The internal probe direction is decodable. But the late-layer last-token regime rotates that signal nearly orthogonal to the action it produces.

This work tries to predict which interventions will actually work and which will not. Most will blame bad prompting or weak tool-call training， and probably ignore the late-layer geometry.

If you have been A/B testing tool-use prompts and getting weird ceilings， this work might offer a good explanation to that behavior.

Paper： https://arxiv.org/abs/2605.14038

Learn to build effective AI agents in our academy： https://academy.dair.ai/
