SGLang 发布首日支持 DeepSeek-V3.2：集成稀疏注意力机制

2025-09-29 00:00·277天前

AI 摘要

SGLang 在发布首日即支持 DeepSeek-V3.2，该模型基于 DeepSeek-V3.1-Terminus 引入 DeepSeek Sparse Attention (DSA) 机制。DSA 通过 Lightning Indexer 和 Top-k Token Selection 将注意力复杂度从 O(L²) 降至 O(Lk)，在 128K 长上下文下实现训练与推理效率大幅提升且质量损失可忽略。SGLang 实现了专用缓存与 Native Sparse Attention 后端，并提供了面向 NVIDIA、AMD MI350X/MI355X 及 NPU 的部署方案与容器镜像。

原文 · 未翻译

Contents

Installation and QuickStart

Description

DeepSeek Sparse Attention: Long-Context Efficiency Unlocked

Future Work

Acknowledgments

SGLang Day 0 Support for DeepSeek-V3.2 with Sparse Attention

We are excited to announce that SGLang supports DeepSeek-V3.2 on Day 0! According to the DeepSeek tech report, it equips DeepSeek-V3.1-Terminus with DeepSeek Sparse Attention (DSA) through continued training. With DSA, a fine-grained sparse attention mechanism powered by a lightning indexer, DeepSeek-V3.2 achieves significant efficiency improvements in both training and inference, especially in long-context scenarios. For more details about upcoming features, please check our Roadmap.

Installation and QuickStart

To get started, simply pull the container and launch SGLang as follows:

docker pull lmsysorg/sglang:v0.5.3-cu129 python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --dp 8 --enable-dp-attention

For AMD (MI350X/MI355X):

docker pull lmsysorg/sglang:dsv32-rocm SGLANG_NSA_FUSE_TOPK=false SGLANG_NSA_KV_CACHE_STORE_FP8=false SGLANG_NSA_USE_REAL_INDEXER=true SGLANG_NSA_USE_TILELANG_PREFILL=True python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3.2-Exp --disable-cuda-graph --tp 8 --mem-fraction-static 0.85 --page-size 64 --nsa-prefill "tilelang" --nsa-decode "aiter" SGLANG_NSA_FUSE_TOPK=false SGLANG_NSA_KV_CACHE_STORE_FP8=false SGLANG_NSA_USE_REAL_INDEXER=true SGLANG_NSA_USE_TILELANG_PREFILL=True python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3.2-Exp --disable-cuda-graph --tp 8 --mem-fraction-static 0.85 --page-size 64 --nsa-prefill "tilelang" --nsa-decode "tilelang"

LMSYS：Blog（Chatbot Arena 团队）

导出 Markdown

SGLang 发布首日支持 DeepSeek-V3.2：集成稀疏注意力机制

2025-09-29 00:00·277天前

阅读原文· lmsys.org

AI 摘要

原文 · 保持原样，未翻译

Contents

Installation and QuickStart

Description

DeepSeek Sparse Attention: Long-Context Efficiency Unlocked

Future Work

Acknowledgments

SGLang Day 0 Support for DeepSeek-V3.2 with Sparse Attention