Google DeepMind论文指出,AI智能体的安全威胁不仅源于模型本身,更在于其实时交互的信息环境。研究首次系统阐述了如何将网络武器化以攻击自主智能体,并提出了针对感知、推理、记忆、行动等维度的“AI智能体陷阱”分类法。关键发现是,对智能体构成威胁的网页无需呈现恶意外观,因为它们可能解析人类不可见的隐藏内容。一旦引入RAG等记忆机制,潜伏的记忆污染攻击成功率可超过80%。研究强调,当智能体能在推理时摄取网络信息,每个页面、文档和记忆写入都成为了安全边界的一部分。
Google DeepMind's paper shows that the real security problem for AI agents is not just the model, but the environment it reads.
Presents the first systematic framework for understanding how the web itself can be weaponized against autonomous AI agents. As agents increasingly browse the internet, read emails, execute transactions, and spawn sub-agents, the information environment becomes an attack surface.
In one cited benchmark, hidden prompt injections embedded in web content partially commandeered agents in up to 86% of scenarios, sub-agent hijacking working 58-90% of the time, and data exfiltration attacks clearing 80% across five different agent architectures.