RT-Lynx：以正确方式利用 GEMM 稀疏性提升扩散模型性能

2026-05-26 08:00·38天前

AI 摘要

RT-Lynx 提出了一种新范式，将扩散模型（Diffusion Transformers）的加速方法从权重稀疏化转向激活稀疏化。研究发现，DiT 模型的激活值具有内在稀疏性，且比权重更耐受 N:M 半结构化稀疏剪枝。通过在激活上应用 N:M 稀疏化并引入误差补偿技术，RT-Lynx 在保持生成质量的同时，实现了线性层平均最高 1.55 倍的推理速度提升。该方法在多个扩散模型上通过了实验验证。

原文 · 未翻译

Diffusion Transformers (DiT) achieve strong performance in image generation but incur substantial inference costs. While prior work has reduced this cost via quantization and distillation, semi-structured sparsity, which can nearly halve FLOPs, remains underexplored. A key reason is that most existing approaches focus on weight sparsification, and pruning 50% of the weights can remove critical model capacity and degrade generation quality. Our study, however, shows that DiT activations are intrinsically sparse and significantly more robust to N:M semi-structured sparsification than weights. Motivated by this observation, we advocate a paradigm shift from weight sparsification to activation sparsification. We propose RT-Lynx, which applies N:M sparsification to activations and incorporates error-compensation techniques to mitigate accuracy loss. We further implement highly optimized CUDA kernels tailored to this setting, achieving up to a 1.55x speedup on average in linear layers. Extensive experiments across multiple diffusion models demonstrate that our method preserves the generation quality of the original models while substantially accelerating inference.

HuggingFace Daily Papers（社区热门论文）

58导出 Markdown

RT-Lynx：以正确方式利用 GEMM 稀疏性提升扩散模型性能

2026-05-26 08:00·38天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译