# RT-Lynx：以正确方式利用 GEMM 稀疏性提升扩散模型性能

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-26 08:00
- AIHOT 分数：58
- AIHOT 链接：https://aihot.virxact.com/items/cmpnfrc8n0x79sl012tzff8ke
- 原文链接：https://arxiv.org/abs/2605.26632

## AI 摘要

RT-Lynx 提出了一种新范式，将扩散模型（Diffusion Transformers）的加速方法从权重稀疏化转向激活稀疏化。研究发现，DiT 模型的激活值具有内在稀疏性，且比权重更耐受 N:M 半结构化稀疏剪枝。通过在激活上应用 N:M 稀疏化并引入误差补偿技术，RT-Lynx 在保持生成质量的同时，实现了线性层平均最高 1.55 倍的推理速度提升。该方法在多个扩散模型上通过了实验验证。

## 正文

Diffusion Transformers (DiT) achieve strong performance in image generation but incur substantial inference costs. While prior work has reduced this cost via quantization and distillation, semi-structured sparsity, which can nearly halve FLOPs, remains underexplored. A key reason is that most existing approaches focus on weight sparsification, and pruning 50% of the weights can remove critical model capacity and degrade generation quality. Our study, however, shows that DiT activations are intrinsically sparse and significantly more robust to N:M semi-structured sparsification than weights. Motivated by this observation, we advocate a paradigm shift from weight sparsification to activation sparsification. We propose RT-Lynx, which applies N:M sparsification to activations and incorporates error-compensation techniques to mitigate accuracy loss. We further implement highly optimized CUDA kernels tailored to this setting, achieving up to a 1.55x speedup on average in linear layers. Extensive experiments across multiple diffusion models demonstrate that our method preserves the generation quality of the original models while substantially accelerating inference.
