研究表明,AI代理使用grep、文件读取等基础终端工具直接搜索原始数据,在多项基准测试中表现远超传统语义检索系统。例如,在BrowseComp-Plus基准上,终端搜索将准确率从69%提升至80%,同时降低成本。核心观点在于,检索不仅是模型问题,更是交互界面问题。直接语料交互允许代理进行精确字符串搜索、检查上下文并持续验证假设,从而从已定位文档中提取更多有效证据,其增益主要来自更充分地利用已发现文档,而非找到更多相关文档。局限性在于,随着语料库规模扩大,定位初始锚点的成本迅速增加,因此终端搜索无法完全替代大型索引。但对于强大AI代理,性能瓶颈可能在于工具允许其“触及”数据的深度。
Better search may come less from smarter indexes than from giving agents a richer way to touch text.
Shows that AI agents using basic terminal tools like grep, file reads, and shell commands to search raw data perform far better than conventional retrieval systems on multiple benchmarks.
On BrowseComp-Plus, swapping semantic retrieval for terminal search raised accuracy from 69% to 80% while lowering cost.
The deeper point is not that grep is magically smarter than embeddings.
It is that retrieval is usually treated as a model problem, when it is also an interface problem.