SemiAnalysis@SemiAnalysis_

2026-06-18 05:00·15天前

AI 摘要

Wide Expert Parallelism增加了每个MoE部署可用的总内存带宽。这意味着模型将MoE专家权重分布到多个GPU上，因此每个GPU只需加载一小部分权重。这转化为每个GPU更高的吞吐量，提升了每美元性能和每瓦性能。

Wide Expert Parallelism increases the total memory bandwidth available per MoE deployment. This means the model distributes the MoE expert weights across multiple GPUs， so each GPU only needs to load a tiny fraction of the weights. This translates to higher throughput per GPU， increasing perf per dollar and perf per watt.

现象/趋势部署/工程

在 X 查看原推导出 Markdown

SemiAnalysis@SemiAnalysis_ · X

45导出 Markdown

2026-06-18 05:00·15天前

在 X 看原推· x.com

AI 摘要

现象/趋势部署/工程