Rohan Paul@rohanpaul_ai

2026-06-01 22:12·31天前

AI 摘要

当前AI智能体的扩展方法常错误地将计算资源消耗等同于学习证据。新研究指出，两次运行消耗相同预算，但反馈的有效性可能天差地别。为此，研究提出了“有效反馈计算”（EFC）指标，仅统计那些正确、新颖、相关且被记住、并能改变后续决策的反馈。研究还结合任务需求对EFC进行归一化。实验表明，任务归一化的EFC比原始计算指标更能预测失败。在一项匹配预算测试中，采用更好反馈的方法将任务成功率从0.27提升至0.90，而成本和工具调用次数保持不变。链接：arxiv.org/abs/2605.29682 标题："Scaling Laws for Agent Harnesses via Effective Feedback Compute"

Better AI agent systems scale by remembering useful feedback， not by spending more compute.

The simple mistake is to count tokens， calls， or dollars as if they were all evidence.

The authors say those numbers miss the real issue， because 2 runs can spend the same budget while only 1 gets feedback that is correct， new， relevant， and remembered.

An agent harness is not just a wrapper around a model； it is a feedback machine that decides what to test， what to trust， what to store， and what to ignore.

Their answer is Effective Feedback Compute， or EFC， a score that counts feedback only when it teaches the agent something useful and changes later decisions.

They also divide EFC by task demand， because a small lookup task and a messy software-repair task need different amounts of helpful feedback before the agent has enough to solve them.

They tested this on synthetic tasks， code tasks with executable tests， real benchmark traces， held-out settings， and a new prospective batch， then compared EFC with raw compute and a strong agent-scaling baseline.

The main result is that task-normalized EFC predicted failures much better than raw compute， and in 1 matched-budget test， better feedback raised success from 0.27 to 0.90 while cost and tool calls stayed fixed.

----

Link - arxiv. org/abs/2605.29682

Title： "Scaling Laws for Agent Harnesses via Effective Feedback Compute"

智能体arXiv

Rohan Paul@rohanpaul_ai · X

60导出 Markdown