语言模型需要睡眠

2026-05-25 08:00·39天前

AI 摘要

针对Transformer架构在处理长上下文时注意力机制效率低的问题，研究提出一种“睡眠式巩固机制”。该方法让模型定期将近期上下文转换为持久化的快速权重，并清空键值缓存。期间，模型通过N次离线循环处理累积上下文，并通过局部规则更新其状态空间模型块中的快速权重。这使得额外计算被转移至“睡眠”阶段，从而保持了推理的实时性。该方法在细胞自动机、多跳图检索等合成任务及一项数学推理任务（常规Transformer及SSM-Attention混合模型均失败）上进行了测试。结果表明，增加睡眠持续期N能提升性能，在需要更深层次推理的任务上增益最大。

原文 · 未翻译

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs N offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning.

HuggingFace Daily Papers（社区热门论文）

64导出 Markdown

语言模型需要睡眠

2026-05-25 08:00·39天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译