# 聚合多台B200 GPU机器，吞吐量提升7倍并显著降低成本

- 来源：SemiAnalysis (@SemiAnalysis_)
- 发布时间：2026-05-13 01:01
- AIHOT 分数：61
- AIHOT 链接：https://aihot.virxact.com/items/cmp2vs0nh0127sl1qt0cpqofc
- 原文链接：https://x.com/SemiAnalysis_/status/2054245527957508520

## AI 摘要

通过RoCEv2 CX-7以太网和Tomahawk交换机连接多台B200 8-GPU机器，并采用名为PD disaggregation的推理优化技术，单GPU的token吞吐量最高可提升7倍。吞吐量的大幅提升使得每百万token的成本也相应降低了最多7倍。这一成果得益于Inferact和vLLM项目开发的开源引擎，以及NVIDIA团队构建的动态推理编排器。未来针对B200 disaggregation的性能还将有进一步改进。

## 正文

THE MORE U BUY， THE MORE U SAVE： By ganging up multiple B200 8-GPU machines together over RoCEv2 CX-7 ethernet with Tomahawk switches with an inference optimization called PD disaggregation， the per GPU token throughput increases up to 7x. By increasing per GPU token throughput by up to 7x， this decreases cost per million tokens by up to 7x also.

Great work to @inferact & @vllm_project for building this amazing OSS engine & for @NVIDIADC @KranenKyle for building dynamo inference orchestrator. More improvements to disagg b200 perf to come！