# Wide Expert Parallelism提升MoE吞吐与性价比

- 来源：SemiAnalysis (@SemiAnalysis_)
- 发布时间：2026-06-18 05:00
- AIHOT 分数：45
- AIHOT 链接：https://aihot.virxact.com/items/cmqiksvfo01wasl5wogrqqejk
- 原文链接：https://x.com/SemiAnalysis_/status/2067351751561257327

## AI 摘要

Wide Expert Parallelism增加了每个MoE部署可用的总内存带宽。这意味着模型将MoE专家权重分布到多个GPU上，因此每个GPU只需加载一小部分权重。这转化为每个GPU更高的吞吐量，提升了每美元性能和每瓦性能。

## 正文

Wide Expert Parallelism increases the total memory bandwidth available per MoE deployment. This means the model distributes the MoE expert weights across multiple GPUs， so each GPU only needs to load a tiny fraction of the weights. This translates to higher throughput per GPU， increasing perf per dollar and perf per watt.