当前AI智能体的扩展方法常错误地将计算资源消耗等同于学习证据。新研究指出,两次运行消耗相同预算,但反馈的有效性可能天差地别。为此,研究提出了“有效反馈计算”(EFC)指标,仅统计那些正确、新颖、相关且被记住、并能改变后续决策的反馈。研究还结合任务需求对EFC进行归一化。实验表明,任务归一化的EFC比原始计算指标更能预测失败。在一项匹配预算测试中,采用更好反馈的方法将任务成功率从0.27提升至0.90,而成本和工具调用次数保持不变。 链接:arxiv.org/abs/2605.29682 标题:"Scaling Laws for Agent Harnesses via Effective Feedback Compute"
Better AI agent systems scale by remembering useful feedback, not by spending more compute.
The simple mistake is to count tokens, calls, or dollars as if they were all evidence.
The authors say those numbers miss the real issue, because 2 runs can spend the same budget while only 1 gets feedback that is correct, new, relevant, and remembered.
An agent harness is not just a wrapper around a model; it is a feedback machine that decides what to test, what to trust, what to store, and what to ignore.