Lumos-Nexus：一种基于同质潜空间的高效频率桥接视频统一模型训练框架

2026-05-29 08:00·35天前

AI 摘要

针对现有指令驱动视频统一模型因集成高保真生成器导致训练计算成本高昂的问题，Lumos-Nexus 提出了一种两阶段高效训练框架。训练阶段，模型仅将轻量生成器与理解模块对齐，学习接收推理驱动的语义控制。推理阶段，则引入统一渐进频率桥接机制，在共享潜空间中逐步将生成任务移交至高容量的预训练生成器，实现由粗到细的优化，生成高质量视频。为评估此能力，研究同时发布了新基准 VR-Bench。实验证明，该模型在 VBench 上视觉真实度和时间连贯性显著提升，并在 VR-Bench 上展现出强大的推理生成性能。

原文 · 未翻译

Connector-based video unified models have demonstrated strong capability in instruction-grounded video synthesis, but integrating a large high-fidelity generator into the unified training loop is computationally prohibitive, limiting achievable visual quality. We therefore propose Lumos-Nexus, a training-efficient unified video generation framework that facilitates the development of strong reasoning-driven generation capabilities while significantly enhancing visual fidelity. Lumos-Nexus adopts a two-stage design: 1) During training, only a lightweight generator is aligned with the understanding block to learn to take in reasoning-driven semantic control. 2) During inference, we introduce Unified Progressive Frequency Bridging (UPFB) to progressively hand off generation to a high-capacity pretrained generator in the shared latent space, enabling coarse-to-fine refinement and producing high-fidelity videos without compromising reasoning quality. To fill the gap in reasoning-driven video generation benchmarks, we introduce VR-Bench, which assesses a model's capability to translate inferred intent into coherent and semantically aligned video content. Extensive experiments demonstrate that Lumos-Nexus achieves substantial gains in visual realism and temporal coherence on VBench, while exhibiting strong reasoning-based generative performance on VR-Bench. Code and models are available at https://jiazheng-xing.github.io/nexus-lumos-home/.

HuggingFace Daily Papers（社区热门论文）

63导出 Markdown

Lumos-Nexus：一种基于同质潜空间的高效频率桥接视频统一模型训练框架

2026-05-29 08:00·35天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译

Lumos-Nexus： 一种基于同质潜空间的高效频率桥接视频统一模型训练框架

Lumos-Nexus： 一种基于同质潜空间的高效频率桥接视频统一模型训练框架

Lumos-Nexus：一种基于同质潜空间的高效频率桥接视频统一模型训练框架

Lumos-Nexus：一种基于同质潜空间的高效频率桥接视频统一模型训练框架