Critic-R：利用自然语言内省反馈改进智能体搜索的检索模型

2026-05-30 08:00·34天前

AI 摘要

Critic-R 框架在推理和训练阶段显式连接推理智能体与检索模型的反馈循环。它引入批评模型评估智能体的内省推理轨迹，判断上下文是否充分支持下一步推理。框架包含两种机制：Critic-R-Zero 在推理时迭代重写查询与检索指令；Critic-Embed 利用成功与失败的细化轨迹自动监督检索模型训练，无需人工标注。在 HotpotQA、2WikiMultihopQA、MuSiQue 和 Bamboogle 上的实验表明，Critic-R 显著提升了检索质量和答案准确率。

原文 · 未翻译

Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-standard annotations that limit real-world applicability. We propose Critic-R, a framework that explicitly closes the feedback loop between the reasoning agent and the retrieval model during both inference and training. Critic-R introduces a critic model that evaluates the agent's introspective reasoning trace after consuming retrieved evidence to determine whether the retrieved context sufficiently supports the next reasoning step. Critic-R has two complementary mechanisms: Critic-R-Zero, an inference-time query refinement loop that iteratively rewrites queries and retrieval instructions, and Critic-Embed, an optimization approach for retrieval models that leverages successful and failed refinement trajectories as automatic supervision without requiring manual relevance annotation. We evaluate Critic-R on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle. Results show that Critic-R significantly improves both retrieval quality and downstream answer accuracy.

HuggingFace Daily Papers（社区热门论文）

43导出 Markdown

Critic-R：利用自然语言内省反馈改进智能体搜索的检索模型

2026-05-30 08:00·34天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译