One-Forcing: 实现稳定的单步自回归视频生成
阅读原文· arxiv.org针对现有少步自回归视频生成方法在单步设置下面临的质量下降与训练不稳定问题,One-Forcing提出了一种将DMD目标与辅助GAN损失相结合的方法。该方法实现了高质量且高效的单步视频生成。在VBench评测中,它取得了83.76的总分,达到单步因果视频生成的最先进水平,并与强大的多步方法性能相当。研究表明,One-Forcing仅用chunkwise模型三分之一的训练成本,就能稳定实现单步帧级自回归生成。
Recent advances have substantially improved real-time interactive video generation in the autoregressive regime. However, most existing few-step autoregressive video generation methods, often distilled from a corresponding many-step teacher, default to a 4-step sampling configuration, which still incurs considerable latency during deployment and suffers from severe quality degradation when the number of sampling steps is further reduced, particularly in the one-step setting. Trajectory-style consistency distillation methods often produce videos with weak dynamics, while DMD-based approaches, such as Self-Forcing, tend to yield blurry frames. To address this challenge, we propose One-Forcing, a simple yet effective approach which augments the DMD objective with an auxiliary GAN loss for high-quality and efficient one-step video generation. Experiments on VBench show that One-Forcing achieves a total score of 83.76, establishing state-of-the-art performance among one-step causal video generation methods and remaining competitive with strong many-step approaches. We further demonstrate that one-step framewise autoregressive generation can be achieved stably with merely one-third of the training cost of the chunkwise model, a setting that prior methods have failed to achieve successfully.