# GB300 NVL72 在 DeepSeek-V4 Pro 上性能超 B200 6.5 倍

- 来源：SemiAnalysis (@SemiAnalysis_)
- 发布时间：2026-04-30 15:47
- AIHOT 分数：53
- AIHOT 链接：https://aihot.virxact.com/items/cmol7e4s900x7slc584ioqi5b
- 原文链接：https://x.com/SemiAnalysis_/status/2049757475235049538

## AI 摘要

在 DeepSeek-V4 Pro 1.6T 模型上，采用机架级解耦设计的 GB300 NVL72 系统性能达到 B200 的 6.5 倍。这一高吞吐配置得益于 DeepSeek-AI 的 MegaMoe 内核，该内核将专家分派、专家组合及 GEMM 运算完全融合并重叠至单一内核中。性能突破由 Radixark、LMSYS 和 NVIDIA AI 的工程师团队快速实现。CoreWeave 为此项开源性能优化贡献了临时的 GB300 NVL72 机架资源，使整个社区受益。

## 正文

GB300 NVL72 Rack Scale Dynamo SGLang disaggregation has up to 6.5x better performance than B200 on DeepSeekv4 Pro 1.6T 🚀 The high throughput configuration uses @deepseek_ai 's MegaMoe kernels which fully fuses & overlaps EP dispatch & EP combine & the GEMMs into an single kernel. This performance is achieved from the 10x engineers @BanghuaZ， Tom & the rest of the team at @radixark， @lmsysorg & @NVIDIAAI for rapidly enabling this performance！ Big Shoutout to @CoreWeave to contributing temporary GB300 NVL72 racks towards the open source performance optimization for all to benefit！