论文指出AI智能体在部署后,其记忆系统会因摘要、存储、更新和维护而逐渐“衰老”,导致信息丢失、混淆、过时或被破坏。智能体看似仍能工作,但可靠性已悄然下降。为此提出AgingBench基准,用于评估智能体在多会话中的持续可靠性。论文将智能体比作会衰老的基础设施,强调单纯增加记忆并非解决方案。
Super important paper from Univ of Texas.
AI agents can slowly become less reliable after deployment, even when the model itself does not change.
The problem is that agents are often judged when they are fresh, but real agents keep changing because they summarize old chats, store more memories, update facts, and go through maintenance.
An agent that remembers you across weeks is really a small operating system wrapped around a language model: it writes notes, compresses them, retrieves them, updates them, and occasionally cleans house.
Every one of those steps can quietly rot.
A medication dose can become "a daily medication," two similar clients can blur into one, a canceled subscription can remain active, and a schedule can vanish after a maintenance pass.