AI 搜索代理往往只是确认其已知信息，而非真正研究网络

2026-05-31 15:48·32天前·Jonathan Kemper

AI 摘要

哈尔滨工业大学研究人员发现，包括 GPT-5.4 和 Kimi K2.6 在内的领先 AI 搜索代理，在已有的基准测试上并未进行太多真正的网络研究。它们主要利用网络来确认其在训练阶段已学到的知识。研究团队使用名为 LiveBrowseComp 的新基准测试得出了该结论，此测试仅涉及过去 90 天内的事件。当模型无法依赖既有记忆时，其表现显著下降，现有的性能排名也随之改变。

原文 · 未翻译

AI search agents often confirm what they already know instead of actually researching the web

A new study suggests that leading AI search agents don't actually research on established benchmarks; they mostly use the web to confirm answers they already have. Once models have to go beyond their existing knowledge, search performance falls apart.

Frontier models like GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6, DeepSeek-V4-Pro, and Kimi-K2.6 keep posting higher scores on BrowseComp. The benchmark asks agents complex questions that can only be answered through multi-step browsing and piecing together information from different web sources.

Researchers from the Harbin Institute of Technology and Xiaohongshu have now shown in a study that these results say less about the agents' research skills than assumed. The authors call it "intrinsic knowledge dependence" (IKD), a reliance on internal knowledge the models absorbed during training.

The researchers tested eleven models total, first stripping away all search and browsing tools. Even without internet access, the models scored surprisingly high. MiniMax M2.5 solved 44.5 percent of BrowseComp tasks from memory alone. Kimi K2.6 hit 62 percent on the Chinese BrowseComp-ZH variant. A big chunk of benchmark performance, in other words, comes before any search even happens.

Searching can actually hurt the answer

The second test is more telling. The researchers left the search interface in place but removed all answer-supporting documents from the search index. Every model tested then performed worse than it did without any tool access at all. MiniMax M2.5 dropped from 44.5 to 8.0 percent. Kimi-K2.6 fell from 25.5 to 2.3 percent. The search actively pulls agents away from correct gut-feeling answers as soon as no confirming hits show up.

An analysis of the search paths explains why. More than half of all queries come from the model's own reasoning rather than from previously found hits. Even when relevant evidence does appear in search results, the agents fold it into their reasoning less than a third of the time. The loop is model-led, not evidence-led.

The Decoder：AI News（RSS）

60导出 Markdown

AI 搜索代理往往只是确认其已知信息，而非真正研究网络

2026-05-31 15:48·32天前·Jonathan Kemper

阅读原文· the-decoder.com

AI 摘要

原文 · 保持原样，未翻译

AI search agents often confirm what they already know instead of actually researching the web