Meta研究发现,在编程智能体任务中,通过复用过往尝试的简短摘要,其性能显著优于使用原始日志。该论文指出,对于长程编程任务,主要瓶颈已从代码生成转向了如何有效记忆与表示智能体的工作过程。其方法是将每次充满错误的“混乱轨迹”转化为包含核心假设、进展与失败点的紧凑摘要,系统通过锦标赛式选择最佳摘要来指导新一轮尝试。在Claude 4.5 Opus的测试中,该方法使其在SWE-Bench Verified上的得分从70.9%大幅提升至77.6%,证明提升性能的关键在于以可复用的形式存储经验。
Meta paper shows that coding agents get much better when they reuse short summaries of past attempts instead of raw logs.
i.e. stronger coding agents do not just need more attempts, but better ways to remember attempts.
That sounds obvious until you look at what an agent actually produces: not an answer, but a messy trail of file reads, shell commands, errors, partial fixes, and abandoned ideas.
The paper's idea is to turn each full attempt into a compact summary of the main guess, partial progress, and failure points, then use those summaries both to pick the best attempts and to guide new ones.