# 语言模型也需要休息

- 来源：Hacker News 热门（buzzing.cc 中文翻译）
- 作者：juxtapose
- 发布时间：2026-05-27 01:49
- AIHOT 分数：66
- AIHOT 链接：https://aihot.virxact.com/items/cmpmxzzm70sttsl01jrvdaoin
- 原文链接：https://arxiv.org/abs/2605.26099

## AI 摘要

一项新研究提出“语言模型也需要休息”的观点。该论文于2026年5月26日在arXiv发布（编号2605.26099），并在Hacker News上获得102点热度。研究可能探讨了大语言模型在持续运行后需要某种形式的“睡眠”或暂停机制，以恢复性能或优化状态。

## 正文

Computer Science > Computation and Language

Title:Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference

Abstract:Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration $N$ for our models improves performance, with the largest gains on examples that require deeper reasoning.

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: arXiv:2605.26099 [cs.CL] (or arXiv:2605.26099v3 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2605.26099 Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access Paper:

View PDF

HTML (experimental)

TeX Source

Current browse context:

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

Author

Venue

Institution

Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
