语言模型也需要休息
阅读原文· arxiv.org一项新研究提出“语言模型也需要休息”的观点。该论文于2026年5月26日在arXiv发布(编号2605.26099),并在Hacker News上获得102点热度。研究可能探讨了大语言模型在持续运行后需要某种形式的“睡眠”或暂停机制,以恢复性能或优化状态。
Computer Science > Computation and Language
Title:Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference
Abstract:Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration $N$ for our models improves performance, with the largest gains on examples that require deeper reasoning.