# 用流形幂迭代重新设计混合专家模型路由器

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-10 08:00
- AIHOT 分数：55
- AIHOT 链接：https://aihot.virxact.com/items/cmq912fmh07nqslld3hj0fij0
- 原文链接：https://arxiv.org/abs/2606.12397

## AI 摘要

MoE模型中路由器矩阵的每一行作为专家代理，通过计算与输入的相似度来决定激活哪些专家。理想情况下，每一行应编码对应专家矩阵的主奇异方向，使点积能更好反映token与专家的亲和度。然而现有设计缺少对齐约束。为此提出Manifold Power Iteration (MPI)方法，采用“Power-then-Retract”范式：先在路由器权重上执行幂迭代步骤，再通过回缩施加范数约束以保证效率和稳定性。理论表明MPI驱动路由器行收敛至对应专家的主奇异方向。在1B至11B参数规模的MoE模型预训练中证实该对齐能提升模型有效性。

## 正文

Router is the cornerstone component to the Mixture-of-Experts models. Serving as expert proxies, the rows of the router matrix compute their similarity to the MoE inputs to determine which subset of experts is activated. Ideally, each router row is designed to encode the expert matrix into this representative vector, such that its dot-product with token can better reflect token-expert affinity. However, there exists no design principles to enforce this condensation. In this paper, we propose to align each router row with the principal singular direction of the associated expert, as this direction provides the most expressive mathematical description of a matrix. Based on this principle, we propose a router redesign with Manifold Power Iteration (MPI). Specifically, it introduces a "Power-then-Retract" paradigm, where a power iteration step is performed on the router weights, followed by a retraction to impose a norm constraint to ensure both efficiency and stability. Theoretically, we show that MPI drives router rows to converge toward the principal singular directions of associated experts. Empirically, we pretrain MoE model across scales from 1B to 11B parameters to confirm that this alignment facilitates more effective MoE models.
