时间扩展混合专家模型
阅读原文· arxiv.org研究团队基于强化学习options框架提出时间扩展MoE架构,通过在每层添加控制器学习专家切换时机,解决传统MoE频繁切换导致的内存效率问题。在gpt-oss-20b上的实验表明,该方法结合低秩适配器与自蒸馏奖励,将专家切换率从50%以上降至5%以下,同时在MATH等基准测试中保持90%的基础模型准确率。这种轻量级方案使现有预训练模型可转换为内存高效的时间扩展MoE,在切换开销与模型能力间实现灵活权衡。
Mixture-of-Experts models, now popular for scaling capacity at fixed inference speed, switch experts at nearly every token. Once a model outgrows available GPU memory, this churn can render optimizations like offloading and pre-fetching ineffective. We make the case that the options framework in reinforcement learning is a perfect match to tackle this problem, and argue for temporally extended mixture-of-experts layers. Building on the option-critic framework with deliberation costs, we add a controller to each layer that learns when to switch expert sets and which to load. By applying this to gpt-oss-20b with low-rank adapters and a self-distillation reward, our method reduces switch rates from over 50% to below 5% while retaining up to 90% of base-model accuracy on MATH, MMLU, and MMMLU. This shows that even existing pre-trained models can be converted to temporally extended MoEs with lightweight training, with the deliberation cost allowing model trainers to trade off switching rates against capability. We hope this opens a principled path, grounded in the options framework, for memory-efficient serving and continual learning in ever-growing MoE models.