DeepSeek开源周第二日推出DeepEP,这是首个面向MoE模型训练与推理的开源EP通信库。该库针对专家并行场景优化,支持NVLink和RDMA的all-to-all通信,既提供高吞吐kernel用于训练与推理预填充,也提供低延迟kernel用于解码阶段。同时原生支持FP8精度,并允许灵活的GPU资源控制以实现计算与通信重叠,显著提升MoE模型效率。
🚀 Day 2 of #OpenSourceWeek: DeepEP
Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.
✅ Efficient and optimized all-to-all communication ✅ Both intranode and internode support with NVLink and RDMA ✅ High-throughput kernels for training and inference prefilling ✅ Low-latency kernels for inference decoding ✅ Native FP8 dispatch support ✅ Flexible GPU resource control for computation-communication overlapping
🔗 GitHub: https://github.com/deepseek-ai/DeepEP