# RL-Index：面向检索索引推理的强化学习方法

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-15 08:00
- AIHOT 分数：41
- AIHOT 链接：https://aihot.virxact.com/items/cmqt3vj490066sl0euiyvn3ob
- 原文链接：https://arxiv.org/abs/2606.16316

## AI 摘要

RL-Index 是一种智能体索引框架，将检索索引推理形式化为强化学习问题。它在索引阶段利用 LLM 生成的推理（rationales）增强文档，并采用 Group Relative Policy Optimization（GRPO）与检索相似度作为可验证奖励信号，直接优化索引决策以提升检索效果。在 BRIGHT 基准上，RL-Index 持续提升检索与下游问答性能，显著降低在线推理延迟，且所学的推理增强可跨不同检索器与生成器泛化，作为一种即插即用的索引策略。

## 正文

Retrieving external knowledge is essential for solving real-world tasks, yet it remains challenging when the relationship between a query and its relevant knowledge involves implicit and complex reasoning beyond surface-level semantic or lexical matching (e.g., mathematical problems relying on the same theorem or coding requiring deep reasoning). Existing approaches primarily rely on query-side reasoning (e.g., query rewriting), which introduces significant online latency and underutilizes the opportunity to perform reasoning over the knowledge corpus itself (i.e., index-side reasoning). In this paper, we propose RL-Index, an agentic indexing framework that formulates retrieval index reasoning as a reinforcement learning problem. Instead of performing reasoning at query time, RL-Index shifts reasoning to the indexing stage by augmenting documents with LLM-generated rationales that explicitly encode the latent query-knowledge relationship. To optimize the quality of these rationales, we employ Group Relative Policy Optimization (GRPO) and use retrieval similarity as a verifiable reward signal, enabling direct optimization of indexing decisions for retrieval effectiveness. Extensive experiments on the BRIGHT benchmark demonstrate that RL-Index consistently improves both retrieval and downstream question-answering performance, while significantly reducing online inference latency. Moreover, the learned rationale augmentation generalizes across diverse retrievers and generators, highlighting its robustness as a plug-and-play indexing strategy across different retrieval systems.