研究指出,驱动智能体性能的关键并非外部编排框架,而是一项核心内在技能:并行推理后进行审议。该研究将这一过程系统化为一个两阶段流程,并通过强化学习与价值回归(RLVR)将其训练为可学习的模型内在能力。实验表明,该方法能显著提升模型性能:例如,GPT-OSS-20B在LiveCodeBench上的成绩从69.7%提升至85.5%;R1-Distill-Qwen-32B在IFEval上的表现从35.7%大幅提升至69.3%。这证明,当此类核心技能能被内化至模型中时,框架优势将转化为模型自身优势,长远来看,模型应原生具备此类能力。
// HeavySkill //
One of the cleaner takes on agentic harness design I've read.
They argue that what actually drives agent harness performance is not the orchestration code. It's a single inner skill: parallel reasoning followed by deliberation.
If you can internalize that into the model and most of the scaffolding becomes optional.
The paper systematizes this as a two-stage pipeline you can run beneath any harness, then trains it as a learnable skill via RLVR.
The numbers: