语言模型需要睡眠:学习自我修改与巩固记忆
阅读原文· arxiv.org受人类学习过程启发,研究提出了一种让大语言模型持续学习的“睡眠”范式。该范式包含两个阶段:第一阶段为记忆巩固,通过“知识播种”将较小模型的记忆向上蒸馏至更大网络,以保留知识并提升容量;第二阶段为“做梦”,模型利用强化学习生成合成数据课程进行自我演练和改进,无需人工监督。实验验证了该范式在长期、持续学习、知识整合及少样本泛化任务上的重要性。
The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.