本研究指出,AI代理的实际性能更多取决于围绕模型的外部控制系统(即代理框架),而非单纯的提示词。当前许多代理看似单一模型,其行为实则由规划、工具调用、记忆管理等周边代码驱动,导致长任务易因状态丢失、验证漂移等环节失败。为此,论文提出“自然语言代理框架”理念,旨在将控制流程以结构化自然语言显式表达,使其可检查、可迁移且可测试。研究发现,虽然更复杂的框架能显著改变代理行为,但并未带来稳定的性能提升,这表明框架设计是保障可靠性的关键选择,而非一种立竿见影的万能方案。
This paper shows that agent performance depends less on prompts alone and more on the harness around them.
"Agent intelligence" is becoming partly a systems problem. The problem is that many AI agents look like 1 model, but their real behavior comes from surrounding code that controls planning, tools, memory, retries, checking, and stopping.
A model may reason well in one step, but long tasks fail in messier places: state disappears, verification drifts, tools return partial evidence, and the agent forgets which intermediate artifact actually matters.
Natural-Language Agent Harnesses try to make that control layer visible.