针对当前Transformer智能体因上下文不断增长而推理变慢变贵的问题,论文提出效仿人类睡眠机制进行记忆巩固。其核心方案是加入周期性的“睡眠阶段”:模型在此阶段暂停,多次重读近期上下文,将有用信息写入固定大小的记忆层(如状态空间块的快速权重),然后清空短期注意力缓存。此离线过程使后续回答仍只需一次前向传播。在细胞自动机、图查找和GSM-Infinite数学问题上的测试表明,更长的睡眠时间能提升性能,尤其对需要深度推理的复杂任务。该思路表明,长期智能体或可通过记忆巩固实现高效遗忘与重用,不必无限携带原始上下文。
Long-running language agents may work better if they periodically stop to consolidate memory.
The problem is that today's transformer agents get slower and more expensive as their context grows, because attention has to keep checking more past tokens.
The usual fix for long context is to keep more tokens nearby, but that turns every next-token prediction into a larger search through the past.
The sharper idea here is that memory is not only storage.
Sometimes the hard part is converting a messy stretch of experience into a state that can actually be used later.