Nvidia 发布了 DWDP (Distributed Weight-Data Parallelism),这是一种专注于 prefill 的新推理并行策略。这听起来有点疯狂,直到你想起目标机器是 GB200 NVL72。核心权衡:花费更多 peer-GPU 带宽,从而减少在 collective barriers 上的等待时间。(1/6) 🧵 https://arxiv.org/abs/2604.01621v1
Nvidia published DWDP (Distributed Weight-Data Parallelism), a new inference parallelism strategy focused on prefill. It sounds slightly insane until you remember the target machine is GB200 NVL72. The core trade: spend more peer-GPU bandwidth so you spend less time waiting at collective barriers. (1/6) 🧵 https://arxiv.org/abs/2604.01621v1