一项研究发现,将grep风格的文本搜索置于合适的智能体框架中,在代码智能体任务上的表现可匹配甚至超越基于嵌入向量的检索方法。这引发了对向量数据库必要性的质疑,核心观点指出代码智能体可能并非需要更好的嵌入模型,而是需要对基础工具进行更优的框架设计。作者建议依赖向量数据库的代码智能体栈应重新评估方案。虽然向量数据库在大规模场景中仍有优势,但智能体搜索若设计得当,已能满足多数用例。目前,结合两者的混合方法通常最优,但尚未被充分掌握。
// Is Grep All You Need? //
Pay attention to this on, AI devs.
(bookmark it)
They find that grep-style text search, when wrapped in the right agent harness, matches or beats embedding-based retrieval on coding-agent tasks.
Are vector databases even needed where this is all going?
It might be that what coding agents needed was not better embeddings. It was better harness design around primitive tools.
If you operate a coding-agent stack that depends on a vector DB, it might be time to re-evaluate.
My personal experience on this has been that agentic search, if done right, is more than good enough for a lot of use cases. But you also have to understand how to properly index and structure information for the agents to take advantage. At scale, vector databases do shine so take that into account as well. In most cases, a hybrid approach often works best but that's something we haven't figured out really well as of yet.