Rohan Paul@rohanpaul_ai

2026-06-04 18:45·28天前

AI 摘要

伊利诺伊大学和清华大学等实验室研究发现，LLM智能体重复重写自身记忆会导致记忆变得更不可靠。原始经历（实际过往尝试和解决方案）往往比提炼后的总结更有用。测试中，GPT-5.4在小型ARC-AGI数据集上无记忆时正确率100%，但建立记忆并持续更新后降至约54%。失败原因包括分组不当、教训过度泛化及过拟合。研究建议智能体不应自动将每个经历重写为摘要，保留原始证据并仅偶尔总结效果更好。

This Illinois+ Tsinghua University and other labs study finds that LLM agents still have unreliable memory and that it can get worse when they keep rewriting their own memories.

LLM agents can learn from experience， but their rewritten memories often become unreliable.

The problem is that many agent systems store past work by asking an LLM to compress messy experience into neat written lessons.

That sounds useful because the agent should remember what worked before， but the paper finds that repeated rewriting slowly damages the memory.

The core idea is that raw episodes， meaning the actual past attempts and solutions， often stay more useful than the polished lessons made from them.

The authors tested this across tasks like web shopping， simulated worlds， app use， and ARC-style puzzle problems where they could control the correct solutions.

The sharpest result is that GPT-5.4 solved 100% of a small ARC-AGI set with no memory， but after memory was built from correct solutions， streaming updates dropped it to about 54%.

The failures came from bad grouping， overbroad lessons， and overfitting， so the memory forgot details， mixed up task types， or learned rules that only worked on narrow examples.

The big deal is that agent memory should not automatically rewrite every experience into a summary， because keeping raw evidence and only sometimes making summaries worked better.

The paper is really proposing that agent memory should treat raw past episodes as important evidence， not as disposable notes to summarize away.

----

arxiv. org/abs/2605.12978

Title： "Useful Memories Become Faulty When Continuously Updated by LLMs"

Rohan Paul@rohanpaul_ai · X

66导出 Markdown