# 面向智能体编码的测试时计算缩放

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-16 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmob5e9hb061nsl1yduhykefn
- 原文链接：https://arxiv.org/abs/2604.16529

## AI 摘要

针对长程编码智能体输出冗长、难以比较的问题，本文提出基于轨迹压缩的测试时计算缩放框架。通过将执行过程转化为保留关键假设、进展与失败模式的结构化摘要，实现有效选择与会话。框架包含并行缩放的递归锦标赛投票（RTV）和序列缩放的Parallel-Distill-Refine（PDR）两种机制。实验显示，Claude-4.5-Opus在SWE-Bench Verified上准确率从70.9%提升至77.6%，在Terminal-Bench v2.0上从46.9%提升至59.1%，验证了表示、选择与重用的核心作用。

## 正文

Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined. Long-horizon coding agents violate this premise: each attempt produces an extended trajectory of actions, observations, errors, and partial progress taken by the agent. In this setting, the main challenge is no longer generating more attempts, but representing prior experience in a form that can be effectively selected from and reused. We propose a test-time scaling framework for agentic coding based on compact representations of rollout trajectories. Our framework converts each rollout into a structured summary that preserves its salient hypotheses, progress, and failure modes while discarding low-signal trace details. This representation enables two complementary forms of inference-time scaling. For parallel scaling, we introduce Recursive Tournament Voting (RTV), which recursively narrows a population of rollout summaries through small-group comparisons. For sequential scaling, we adapt Parallel-Distill-Refine (PDR) to the agentic setting by conditioning new rollouts on summaries distilled from prior attempts. Our method consistently improves the performance of frontier coding agents across SWE-Bench Verified and Terminal-Bench v2.0. For example, by using our method Claude-4.5-Opus improves from 70.9% to 77.6% on SWE-Bench Verified (mini-SWE-agent) and 46.9% to 59.1% on Terminal-Bench v2.0 (Terminus 1). Our results suggest that test-time scaling for long-horizon agents is fundamentally a problem of representation, selection, and reuse.
