Steady-Forcing：长时序自然视频扩散中空间持久性与运动连续性的平衡

2026-06-02 08:00·31天前

AI 摘要

Steady-Forcing 提出记忆与训练框架，结合持久视觉锚点（V-Sink）、指数移动平均运动记忆（EMA-Sink）、块相对时序编码、周期性缓存净化以及经运动奖励先验的 Wan2.1-14B 教师模型知识蒸馏，在多分钟自回归生成中保持背景身份并维持视觉合理的流体动力学。七个基线评估显示该方法提升了长时序背景一致性和成像质量，盲测表明用户感知的稳定性和运动连续性更强。研究还发现 VBench 综合评分未有效惩罚固定相机伪影，而是将漂移引起的光流奖励为动态程度，却未直接惩罚纹理硬化或流动停滞。

原文 · 未翻译

Autoregressive video diffusion models enable streaming generation but often degrade over long rollouts: static scene layouts drift, while mechanisms that improve spatial stability tend to suppress motion, causing natural flows such as water, fire, or smoke to stagnate. We study this stability-motion trade-off in fixed-camera long-horizon nature video generation, where the two failure modes can be more clearly separated than in moving-camera settings. We propose Steady-Forcing, a memory and training framework combining a persistent visual anchor (V-Sink), an exponential moving-average motion memory (EMA-Sink), block-relative temporal encoding, periodic cache purification, and distillation from a Wan2.1-14B teacher with motion-rewarded priors under task-focused configurations. Together, these components are designed to preserve background identity while sustaining visually plausible fluid dynamics over multi-minute autoregressive rollouts. Evaluations across seven baselines show that Steady-Forcing improves long horizon background consistency and imaging quality, while a blind user study indicates stronger perceived stability and motion continuity. The benchmark evaluation further suggest that generic VBench aggregate scores under-penalize fixed-camera artifacts as well as rewarding drift-induced optical flow as Dynamic Degree while not directly penalizing texture hardening or flow stagnation - motivating future task-specific benchmarks for static-camera nature-flow evaluation. Project page: https://minar09.github.io/steadyforcing/

HuggingFace Daily Papers（社区热门论文）

48导出 Markdown

Steady-Forcing：长时序自然视频扩散中空间持久性与运动连续性的平衡

2026-06-02 08:00·31天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译