Rohan Paul@rohanpaul_ai

2026-06-14 18:03·18天前

AI 摘要

针对Transformer agent随上下文增长而变慢、变贵的问题，新论文提出“睡眠阶段”：模型暂停，多次重读近期上下文，将有用信息通过状态空间块的fast weights写入固定大小的记忆层，然后清空注意力缓存。额外计算在睡眠时完成，正常预测仍只需一次前向传播。在元胞自动机、图查找、GSM-Infinite数学问题上的测试表明，更长的睡眠提升性能，尤其是需要深入推理的难题。核心启示：长程agent无需无限扩大原始上下文，可通过巩固重要部分、遗忘原始token来解决。

Long-running language agents may work better if they periodically stop to consolidate memory.

The problem is that today's transformer agents get slower and more expensive as their context grows， because attention has to keep checking more past tokens.

The usual fix for long context is to keep more tokens nearby， but that turns every next-token prediction into a larger search through the past.

The sharper idea here is that memory is not only storage.

Sometimes the hard part is converting a messy stretch of experience into a state that can actually be used later.

So the paper's idea is to add a sleep phase， where the model pauses， rereads recent context several times， writes the useful information into fixed-size memory layers， and then clears the short-term attention cache.

During sleep， the model runs several offline passes over recent context， writes the result into fast weights inside its state-space blocks， then clears the attention cache.

This means the model pays extra compute while sleeping， not while answering， so normal prediction can still happen with 1 forward pass.

The authors test this on cellular automata， graph lookup， and GSM-Infinite math problems， where the model must use old information that is no longer sitting in its attention cache.

The main result is that longer sleep improves performance， especially on harder cases that need deeper reasoning rather than just remembering a fact.

The big deal is that long-horizon agents may not need to carry bigger and bigger raw context forever， because they can consolidate the important parts and safely forget the raw tokens.

----

Link - arxiv. org/abs/2605.26099

Title： "Language Models Need Sleep"

Rohan Paul@rohanpaul_ai · X

59导出 Markdown