# Harness-1：通过状态外部化提升搜索智能体性能

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-06-05 09:24
- AIHOT 分数：60
- AIHOT 链接：https://aihot.virxact.com/items/cmq08tmct03yisltrmhmdgwcu
- 原文链接：https://x.com/rohanpaul_ai/status/2062707191233266159

## AI 摘要

Harness-1 将大语言模型的记忆工作转移到外部辅助系统（harness），解决传统搜索智能体需在同一上下文窗口内处理语义决策与状态记录导致的效率低下问题。模型仅负责搜索、验证等关键语义选择，而可恢复状态（候选池、证据链接、去重记录、预算感知记忆等）由 harness 追踪。这一分离使一个 20B 参数模型实现了更好的搜索表现。在强化学习中，外部化状态避免了失败原因混淆，有助于策略学习。Harness-1 在未见 benchmark 上提升更大，表明模型学到了可复用的搜索策略而非记忆领域习惯。论文 arXiv:2606.02373。

## 正文

Harness-1 makes search agents better by moving memory work out of the model and into a helper system.

Shows that intelligence performs better when the environment stops forcing it to spend cognition on bookkeeping.

That search agents should stop using the LLM as the notebook and let a separate harness track the search state.

The paper proved that a 20B model improved search by doing less inside its own head.

The problem is that normal search agents must both think about the next search and remember every document， clue， failed path， and remaining check inside the same limited context.

This formulation puts too much routine state management inside the policy.

Harness-1 separates those jobs.

The model keeps the hard semantic choices： what to search， what to inspect， what to verify， and when the evidence is good enough.

The harness keeps the recoverable state： candidate pools， curated documents， importance tags， evidence links， verification records， deduplicated observations， and budget-aware memory rendering.

That sounds minor until you look at reinforcement learning.

RL works poorly when every failure looks the same， because an empty or wrong final set does not reveal whether the agent searched badly， forgot evidence， skipped verification， or curated carelessly.

By externalizing state， Harness-1 gives the policy a cleaner learning problem： improve decisions over a visible search workspace.

For Harness-1， its gains were larger on held-out benchmarks than on source-family tasks， suggesting the model learned reusable search moves rather than memorized domain habits.

----

Link - arxiv. org/abs/2606.02373

Title： "Harness-1： Reinforcement Learning for Search Agents with State-Externalizing Harnesses"
