论文“Harness Updating Is Not Harness Benefit”挑战了常见直觉——把最强模型放在进化者位置以写出更好更新。实验表明,廉价模型Qwen3.5-9B即可写出与Claude Opus 4.6效果相近的提示、记忆和技能更新。昂贵模型更适合作为求解任务的智能体,因弱模型无法正确加载或遵循更新,强模型已近能力上限,收益有限。甜区在中档模型:既能调用新程序,又有足够学习空间。
Better self-improving agents need better solvers, not bigger update-writing models.
This challenges the common habit of putting the strongest model in the evolver seat.
The usual intuition was: put the strongest model in the evolver seat, because a better model should write better prompts, memories, tools, and skills.
This paper cuts that intuition in half.
It separates two jobs that are usually blurred together: writing useful harness updates, and benefiting from those updates during task execution.
The paper says the cheaper model can often write good enough prompt, memory, or skill updates. So a small Qwen3.5-9B evolver can create updates that help about as much as Claude Opus 4.6.