Rohan Paul@rohanpaul_ai

2026-05-17 19:23·46天前

AI 摘要

研究表明，AI代理使用grep、文件读取等基础终端工具直接搜索原始数据，在多项基准测试中表现远超传统语义检索系统。例如，在BrowseComp-Plus基准上，终端搜索将准确率从69%提升至80%，同时降低成本。核心观点在于，检索不仅是模型问题，更是交互界面问题。直接语料交互允许代理进行精确字符串搜索、检查上下文并持续验证假设，从而从已定位文档中提取更多有效证据，其增益主要来自更充分地利用已发现文档，而非找到更多相关文档。局限性在于，随着语料库规模扩大，定位初始锚点的成本迅速增加，因此终端搜索无法完全替代大型索引。但对于强大AI代理，性能瓶颈可能在于工具允许其“触及”数据的深度。

Better search may come less from smarter indexes than from giving agents a richer way to touch text.

Shows that AI agents using basic terminal tools like grep， file reads， and shell commands to search raw data perform far better than conventional retrieval systems on multiple benchmarks.

On BrowseComp-Plus， swapping semantic retrieval for terminal search raised accuracy from 69% to 80% while lowering cost.

The deeper point is not that grep is magically smarter than embeddings.

It is that retrieval is usually treated as a model problem， when it is also an interface problem.

A conventional retriever turns the corpus into a narrow ritual： ask once， receive a ranked list， reason over whatever survived.

That works when the question is close to a document's semantic center， but it breaks when the answer depends on exact phrases， faint clues， document structure， or a chain of small discoveries.

Direct Corpus Interaction changes the shape of the task.

The agent can search an exact string， inspect nearby context， notice a new entity， constrain the search again， and keep testing its hypothesis against the raw files.

Here's the part most people miss： the gain was not mainly from finding more gold documents， but from extracting more usable evidence once a promising document was reached.

That makes DCI less like a better search engine and more like giving the model fingers.

The limitation is real： as the corpus grows， the cost of finding the first useful anchor rises quickly， and blunt terminal search will not replace indexes for every large， static collection.

But the paper's lesson still lands cleanly.

For capable agents， the bottleneck may no longer be only what they know， or even how they reason， but how much of the world their tools allow them to touch.

Rohan Paul@rohanpaul_ai · X

62导出 Markdown