SAAS:面向智能体搜索过度搜索缓解的自感知强化学习
阅读原文· arxiv.org智能搜索通过迭代推理和外部搜索使大语言模型解决复杂问题,但模型常因缺乏自我认知而导致过度搜索,引发延迟与成本。SAAS是一种强化学习框架,旨在培养动态自我感知以精准调控搜索行为。其核心包括搜索边界建模机制、边界感知奖励模块以及分阶段优化策略,通过序贯课程设计优先优化推理而非搜索正则化。实验证明,SAAS能在保持准确率的同时显著减少不必要的搜索行为。
Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe over-search, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code is anonymously released at https://github.com/XMUDeepLIT/SAAS.