# AMD MI355在GLM5架构上推理成本显著低于NVIDIA B200

- 来源：SemiAnalysis (@SemiAnalysis_)
- 发布时间：2026-05-20 01:01
- AIHOT 分数：56
- AIHOT 链接：https://aihot.virxact.com/items/cmpcwv4le00t0sljlqu3coubp
- 原文链接：https://x.com/SemiAnalysis_/status/2056782305440452635

## AI 摘要

最新基准测试显示，在GLM5架构下，AMD MI355单节点FP8推理成本较NVIDIA B200降低了约40%。这一成果建立在SGLang v0.12针对CUDA与ROCm平台进行的非MTP、MTP及投机解码等多维度优化之上，团队认为性能速度是构建核心壁垒的关键。后续重点将是推动MI355X在生产级推理优化（如FP4）及分布式推理领域追赶CUDA的生态与性能，通过多卡协同提升单卡算力效率，从而进一步降低百万Token的推理成本。

## 正文

AMD ALERT 🚀 MI355 is now 40% cheaper than B200 on GLM5 architecture for Single Node serving FP8 14 weeks after the initial launch of GLM5 on both non-MTP & MTP with spec decode for SGLang v0.12 for both CUDA & ROCm. SPEED IS THE MOAT！！ Great work to @AnushElangovan， @roaner， HaiShaw & his team！

Next step is for MI355X to catch up to CUDA when composing production inference optimizations like FP4 & on distributed inferencing where you can gang up MI355 boxes such that per GPU performance goes up thus the cost per million tokens goes down.
