# LoopCoder-v2：仅循环一次即可高效利用测试时计算

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-06-18 09:17
- AIHOT 分数：67
- AIHOT 链接：https://aihot.virxact.com/items/cmqitcviw046zsl5wp4t8hqsp
- 原文链接：https://x.com/rohanpaul_ai/status/2067416357352984940

## AI 摘要

论文《LoopCoder-v2》质疑“测试时计算越多越好”的观点。作者提出Parallel Loop Transformer架构，使循环可并行运行并共享内存。他们训练了7B参数的代码模型（1/2/3/4次循环），在18T tokens上预训练并微调，测试代码编写、推理、软件工程和工具使用任务。主要结果：2次循环效果最好，将SWE-bench Verified从43.0提升至64.4，而3次和4次循环性能下降。内部分析显示，第二次循环进行了有意义的精炼（改变隐藏状态、注意力模式和预测），后续循环则主要添加重复和噪声。结论：增加一次隐藏循环可大幅提升性能，但继续增加并非自动有益。

## 正文

Big claim in this paper， pushes against the common idea that more test-time compute should keep helping.

Claims a code model gets much better when it rethinks once （i.e. by looping once） inside itself， but worse when it keeps rethinking.

The first loop builds context， the second loop refines it， and later loops mostly disturb it.

The paper studies a faster design called Parallel Loop Transformer， where loops can run almost in parallel and share memory， so the authors can ask a cleaner question about how many loops are actually useful.

They trained 7B code models with 1， 2， 3， and 4 loops on 18T tokens， then tuned and tested them on code writing， code reasoning， software engineering， and tool-use tasks.

The main result is that 2 loops worked best， raising SWE-bench Verified from 43.0 to 64.4， while 3 and 4 loops often got worse.

Their internal checks suggest loop 2 does the real useful refinement， because it changes the model's hidden states， attention patterns， and predictions in meaningful ways.

After loop 2， the extra loops mostly add weaker， more repetitive changes， while a built-in position shift keeps adding the same kind of mismatch cost.

Overall， the paper gives a simple lesson for efficient test-time compute： adding 1 hidden loop can help a lot， but adding more is not automatically better.

----

Link - arxiv. org/abs/2606.18023

Title： "LoopCoder-v2： Only Loop Once for Efficient Test-Time Computation Scaling"