# Q-ARVD： 面向自回归视频扩散模型的量化框架

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-20 08:00
- AIHOT 分数：53
- AIHOT 链接：https://aihot.virxact.com/items/cmpgiz0d50fugsljwh7lppfmp
- 原文链接：https://arxiv.org/abs/2605.21072

## AI 摘要

自回归视频扩散模型在实时视频生成与世界建模中潜力巨大，但其高昂的推理成本亟待量化技术来缓解。研究发现，现有量化方法直接应用效果欠佳，主要面临两大挑战：一是自回归生成中的误差累积导致帧间量化敏感性严重失衡；二是权重中存在显著且模式多样的异常值通道。为此，本文提出Q-ARVD量化框架，通过引入质量感知的帧加权机制来平衡帧间差异，并设计异常值感知的自适应双尺度量化方法以隔离和保护正常通道。大量实验验证了该框架在提升量化模型性能上的显著优势。

## 正文

Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making model quantization a natural direction for improving efficiency. However, quantization for ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existing quantization schemes developed for standard diffusion transformers to ARVDs leads to suboptimal performance, revealing quantization behaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalanced frame-wise quantization sensitivity. Error accumulation during autoregressive generation can induce severely skewed quantization sensitivity across frames, following an exponential-like decay pattern. (C2) Prominent and heterogeneous outlier patterns in weights. Weight distributions exhibit pronounced outlier channels, whose patterns vary substantially across layer types and block depths. To address these issues, we propose Q-ARVD, a novel framework for accurate ARVD quantization. (S1) To tackle the highly unbalanced frame-wise sensitivity, Q-ARVD incorporates a final-quality aware frame-weighting mechanism into the quantization objective. (S2) To prevent heterogeneous outliers from degrading performance, Q-ARVD introduces an outlier-aware adaptive dual-scale quantization, which automatically detects the presence and quantity of outlier channels for an arbitrary layer, and isolates them to protect normal channels. Extensive experiments demonstrate the superiority of Q-ARVD.