针对Transformer agent随上下文增长而变慢、变贵的问题,新论文提出“睡眠阶段”:模型暂停,多次重读近期上下文,将有用信息通过状态空间块的fast weights写入固定大小的记忆层,然后清空注意力缓存。额外计算在睡眠时完成,正常预测仍只需一次前向传播。在元胞自动机、图查找、GSM-Infinite数学问题上的测试表明,更长的睡眠提升性能,尤其是需要深入推理的难题。核心启示:长程agent无需无限扩大原始上下文,可通过巩固重要部分、遗忘原始token来解决。
Long-running language agents may work better if they periodically stop to consolidate memory.
The problem is that today's transformer agents get slower and more expensive as their context grows, because attention has to keep checking more past tokens.
The usual fix for long context is to keep more tokens nearby, but that turns every next-token prediction into a larger search through the past.
The sharper idea here is that memory is not only storage.
Sometimes the hard part is converting a messy stretch of experience into a state that can actually be used later.
So the paper's idea is to add a sleep phase, where the model pauses, rereads recent context several times, writes the useful information into fixed-size memory layers, and then clears the short-term attention cache.