# 搜索智能体遮蔽陈旧观察的机制图与效果边界

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-29 08:00
- AIHOT 分数：57
- AIHOT 链接：https://aihot.virxact.com/items/cmpw3ayog03zhsluk4zr9ho96
- 原文链接：https://arxiv.org/abs/2606.00408

## AI 摘要

该研究系统评估了观察遮蔽策略在不同规模（4B至284B参数）模型骨干与三种检索器上的效果。发现其准确率增益相对于模型无管理时的准确率呈非对称倒U型曲线：弱检索器下效果平缓，强检索器与中等容量模型结合时达到峰值，模型能力饱和后性能急剧下降。其机制源于检索器召回率与模型隐式过滤能力的交互。遮蔽本质上是一种用轮次换token的权衡，它移除了模型已基本忽略的观察；当新增轮次能将失败转化为成功时有益，但当移除模型本会使用的证据时则会失效。

## 正文

Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observations from the context as the trajectory progresses, but it remains unclear when this form of context management helps and why. We study observation masking through a systematic sweep over various agent backbones (4B to 284B parameters) and three retrievers on offline and live-web agentic search benchmarks. We find that the accuracy gain from masking follows an asymmetric inverted-U shape when plotted against the model's accuracy without context management: a plateau under weak retrievers, a peak when a strong retriever meets a mid-capacity model, and a sharp collapse when the model is saturated. This pattern reflects the interaction between retriever recall and the model's implicit filtering capacity, rather than either factor in isolation. Mechanistically, masking implements a token-for-turn trade-off: it removes observations the model has largely stopped attending to and pages the agent rarely re-opens. The added turns help when they convert failures into successes, but they fail when masking removes evidence the model would otherwise have used. We therefore reframe context management as a regime-dependent intervention and provide a holistic perspective for analyzing context use in agentic deep search. We release our scaffold and trajectories here (https://github.com/i-DeepSearch/observation-masking) to support future research.
