AAD-1:用于单步自回归视频生成的不对称对抗蒸馏框架
阅读原文· arxiv.orgAAD-1提出一种不对称对抗蒸馏框架,用于单步自回归图像到视频生成。现有对抗蒸馏方法存在运动崩溃和训练不稳定问题,导致生成静态视频。AAD-1在架构上打破生成器与判别器的对称性:生成器保持因果性以保留自回归采样能力,判别器则双向关注完整时空上下文,为整个视频序列输出一个整体真实性分数,从而有效检测全局时间失败和长程漂移。训练采用分阶段策略,先用分布匹配预热使单步生成器接近教师分布,再开始对抗蒸馏。在VBench上,AAD-1取得单步自回归视频生成的最先进性能。
We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video generation. State-of-the-art methods adopt adversarial distillation but suffer from motion collapse and training instability, resulting in static videos. AAD-1 addresses these challenges through two key designs in architecture and training strategy. Our key architectural insight is to break the symmetry between generator and discriminator. While the generator remains causal to preserve autoregressive sampling capability, the discriminator attends bidirectionally over the full spatiotemporal context and produces a single holistic realism score for the entire video sequence. This asymmetric design enables the discriminator to effectively detect global temporal failures and long-range drift that cause motion collapse in autoregressive generation. To stabilize training, we introduce a phased strategy that first uses distribution matching to bootstrap a stable one-step generator, providing a warm-up phase that brings the student distribution closer to the teacher before adversarial distillation begins. Extensive experiments on VBench demonstrate that AAD-1 achieves state-of-the-art performance in one-step autoregressive video generation.