# Agentic Abstention：大语言模型智能体何时应停止行动

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-27 08:00
- AIHOT 分数：52
- AIHOT 链接：https://aihot.virxact.com/items/cmr0fzj2b01i4sloldjjb9s0k
- 原文链接：https://arxiv.org/abs/2606.28733

## AI 摘要

研究定义Agentic Abstention问题，即智能体在不确定环境下应判断何时停止行动。在网页购物、终端环境、问答等任务上评估13个LLM智能体系统和2个智能体框架对28,000+任务的表现。结果显示关键挑战在于停止时机：部分从不停止，部分在大量冗余交互后才停止。提出CONVOLVE上下文工程方法，将完整交互轨迹蒸馏为可复用停止规则，在WebShop上将Llama-3.3-70B的及时召回率从26.7%提升至57.4%。数据集与代码已开源。

## 正文

LLM agents are expected to act over multiple turns, using search, browsing interfaces, and terminal tools to complete user goals. Yet not every goal is well specified or achievable in the available environment. In such cases, a reliable agent should recognize that further interaction is unlikely to help and abstain from additional tool calls. We define Agentic Abstention, the problem of deciding when an agent should stop acting under uncertainty. Unlike standard LLM abstention, which is usually evaluated as a single-turn answer-or-abstain decision, agentic abstention is a sequential decision problem: an agent can answer, abstain, or gather more information at each turn, and the need to abstain may only become clear after interacting with the environment. We study this problem across web shopping, terminal environments, and question answering, evaluating 13 LLM-as-agent systems and 2 agent scaffolds on more than 28,000 tasks. Our results show that the main challenge is not only whether agents can abstain, but also when they abstain. Some agents never abstain when they should, while others do so only after many unnecessary interactions. This gap is especially large on tasks where the instruction appears feasible until the environment reveals otherwise (e.g., no valid result matches the instruction). We further find that model scale, reasoning, and agent scaffolding affect abstention in different ways, where larger or more capable models sometimes perform worse at timely abstention. Finally, we introduce CONVOLVE, a context engineering method for improving agentic abstention that distills full interaction trajectories into reusable stopping rules. On WebShop, CONVOLVE substantially improves timely abstention without updating model parameters, raising Llama-3.3-70B's timely recall rate from 26.7 to 57.4. Our dataset and code are available at https://lhannnn.github.io/agentic-abstention
