一篇新论文指出AI智能体目前缺乏真正的记忆系统。现有测试只检查最终答案,忽略了记忆系统本身的性能。论文将智能体记忆拆分为存储、事实提取、有用记忆检索、旧/冲突记忆维护四部分,在12个记忆系统、5个工作负载、11个数据集上评测。核心发现:没有一种记忆设计能在所有场景胜出——图记忆擅长关联事实,混合系统善于过滤搜索,原始痕迹则在精确动作历史记录中表现最佳。
This paper asks whether AI agents have a real memory system yet, and finds the answer is mostly no.
The problem is that AI agents now need memory that can store, search, update, and clean up information across long tasks.
The authors say current tests mostly check final answers, so they miss whether the memory system itself is fast, reliable, or good at handling changed facts.
They split agent memory into 4 parts: how memories are stored, how facts are extracted, how useful memories are found, and how old or conflicting memories are maintained.
They tested 12 memory systems across 5 workloads and 11 datasets, including long conversations, multi-session recall, database tasks, and update-heavy settings.