GrepSeek:面向直接语料库交互的搜索智能体训练
阅读原文· arxiv.orgGrepSeek是一种优化的直接语料库交互搜索智能体。它将大型文本语料库视为环境,通过执行可执行的shell命令来查找和组合证据,以解决传统检索系统的局限性。为应对在大语料库上直接应用强化学习导致的不稳定问题,研究提出了两阶段训练流程:首先使用答案感知的“导师”和答案盲目的“规划器”构建冷启动数据集;然后通过组相对策略优化进行训练,使智能体能在与语料库的直接交互中改进搜索行为。此外,引入的语义保持分片并行执行引擎在确保结果字节一致的前提下,显著提升了检索速度。实验表明,GrepSeek在多个开放域问答基准测试中表现优异。
Large Language Model (LLM) search agents have shown strong promise for knowledge-intensive language tasks through multiple rounds of reasoning and information retrieval. Most existing systems access information using a retriever that takes a keyword or natural language query and returns a ranked list of documents using an index of pre-computed document representations. In this work, we explore a complementary perspective in which the search agent treats the corpus itself as the search environment and finds evidence by issuing executable shell commands. We introduce GrepSeek, an optimized direct corpus interaction (DCI) search agent that trains a compact search agent to find, filter, and compose evidence from large text corpora. To address the instability of learning behavior directly with reinforcement learning on large corpora, we propose a two-stage training pipeline. First, we construct a cold-start dataset using an answer-aware Tutor and answer-blind Planner to generate verified, causally grounded search trajectories. Second, we refine the initialized policy with Group Relative Policy Optimization (GRPO), allowing the agent to improve its task-oriented search behavior through direct interaction with the corpus. To make DCI practical at scale, we further use a semantics-preserving sharded-parallel execution engine that accelerates shell-based retrieval by up to 7.6times while preserving byte-exact equivalence with sequential execution of the shell command. Experiments across seven open-domain question answering benchmarks show that GrepSeek achieves the strongest overall token-level F_1 and Exact Match. Our analysis also highlights the limitations of purely lexical interaction on queries with substantial surface-form variation, suggesting DCI as a practical and competitive method for search agents that can complement existing retrieval paradigms in the real world.