🚀 #OpenSourceWeek 第三天:DeepGEMM 推出 DeepGEMM - 一个支持 dense 和 MoE GEMM 的 FP8 GEMM 库,为 V3/R1 的训练和推理提供支持。 ⚡ 在 Hopper GPU 上可达 1350+ FP8 TFLOPS ✅ 无繁重依赖,简洁如教程 ✅ 完全 Just-In-Time 编译 ✅ 核心逻辑仅约 300 行 - 却在大多数矩阵尺寸上超越专家调优的 kernel ✅ 支持 dense 布局及两种 MoE 布局 🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM
🚀 Day 3 of #OpenSourceWeek: DeepGEMM
Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.
⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs ✅ No heavy dependency, as clean as a tutorial ✅ Fully Just-In-Time compiled ✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes ✅ Supports dense layout and two MoE layouts
🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM