SemiAnalysis@SemiAnalysis_

2026-04-30 15:47·63天前

AI 摘要

在 DeepSeek-V4 Pro 1.6T 模型上，采用机架级解耦设计的 GB300 NVL72 系统性能达到 B200 的 6.5 倍。这一高吞吐配置得益于 DeepSeek-AI 的 MegaMoe 内核，该内核将专家分派、专家组合及 GEMM 运算完全融合并重叠至单一内核中。性能突破由 Radixark、LMSYS 和 NVIDIA AI 的工程师团队快速实现。CoreWeave 为此项开源性能优化贡献了临时的 GB300 NVL72 机架资源，使整个社区受益。

GB300 NVL72 Rack Scale Dynamo SGLang disaggregation has up to 6.5x better performance than B200 on DeepSeekv4 Pro 1.6T 🚀 The high throughput configuration uses @deepseek_ai 's MegaMoe kernels which fully fuses & overlaps EP dispatch & EP combine & the GEMMs into an single kernel. This performance is achieved from the 10x engineers @BanghuaZ， Tom & the rest of the team at @radixark， @lmsysorg & @NVIDIAAI for rapidly enabling this performance！ Big Shoutout to @CoreWeave to contributing temporary GB300 NVL72 racks towards the open source performance optimization for all to benefit！

DeepSeek 推理评测/基准

在 X 查看原推导出 Markdown

SemiAnalysis@SemiAnalysis_ · X

53导出 Markdown

2026-04-30 15:47·63天前

在 X 看原推· x.com

AI 摘要