elvis@omarsar0

2026-05-05 22:59·58天前

AI 摘要

研究指出，驱动智能体性能的关键并非外部编排框架，而是一项核心内在技能：并行推理后进行审议。该研究将这一过程系统化为一个两阶段流程，并通过强化学习与价值回归（RLVR）将其训练为可学习的模型内在能力。实验表明，该方法能显著提升模型性能：例如，GPT-OSS-20B在LiveCodeBench上的成绩从69.7%提升至85.5%；R1-Distill-Qwen-32B在IFEval上的表现从35.7%大幅提升至69.3%。这证明，当此类核心技能能被内化至模型中时，框架优势将转化为模型自身优势，长远来看，模型应原生具备此类能力。

// HeavySkill //

One of the cleaner takes on agentic harness design I've read.

They argue that what actually drives agent harness performance is not the orchestration code. It's a single inner skill： parallel reasoning followed by deliberation.

If you can internalize that into the model and most of the scaffolding becomes optional.

The paper systematizes this as a two-stage pipeline you can run beneath any harness， then trains it as a learnable skill via RLVR.

The numbers：

GPT-OSS-20B jumps from 69.7% （M@K） to 85.5% （HM@4） on LiveCodeBench under the heavy-thinking variant.

R1-Distill-Qwen-32B nearly doubles on IFEval， from 35.7% to 69.3%.

Several models reach Pass@N-level performance with HeavySkill.

Harness wins start to look like model wins once you can train them in. If parallel-reasoning-plus-deliberation really is the inner skill， the long arc is models that come with it baked in， not orchestration glue around them.

Paper： https://arxiv.org/abs/2605.02396

Learn to build effective AI agents in our academy： https://academy.dair.ai/

智能体推理论文/研究

elvis@omarsar0 · X

64导出 Markdown