OSP-Next：一种高效的高质量视频生成模型

2026-05-27 08:00·37天前

AI 摘要

OSP-Next是一种高效的文生视频模型，旨在解决扩散 Transformer 全注意力机制的效率瓶颈。它采用混合全-稀疏注意力架构，稀疏部分基于 Skiparse-2D 注意力。基于此，模型提出了稀疏序列并行策略，相较 Ulysses 序列并行降低了 75% 的通信量。此外，模型集成了 HiF8 量化与 Mix-GRPO 后训练。实验表明，OSP-Next 在 VBench 上超越了 Wan2.1 基线，并在 H200 上实现了最高 1.64 倍单卡与 1.52 倍八卡加速。其量化版本在保持性能的同时，在 Ascend 950PR 上实现了显著的加速。

原文 · 未翻译

Diffusion Transformers achieve strong video generation quality, but the quadratic cost of full attention limits efficiency. We introduce OSP-Next, an efficient text-to-video generation model that integrates sparse attention, parallelism, quantization, and reinforcement learning. OSP-Next uses a hybrid full-sparse attention architecture, where the sparse component is implemented with Skiparse-2D Attention. This fixed-pattern mechanism applies token-wise and group-wise sparse attention along spatial dimensions, leveraging locality while maintaining native compatibility with FlashAttention kernels. Based on the local equivalence of rearrangement in Skiparse-2D Attention, we further propose Sparse Sequence Parallelism (SSP), which partitions subsequences across ranks and switches sparse patterns through a single All-to-All communication. Compared with Ulysses Sequence Parallelism (SP), SSP provides a native parallel strategy for sparse attention and reduces communication volume by 75%. OSP-Next also incorporates HiF8 quantization to enable stable joint training with 8-bit quantization and sparse fine-tuning, and applies Mix-GRPO post-training to improve the performance of the sparse model. Experiments show that OSP-Next achieves a VBench total score of 83.73%, surpassing the Wan2.1 baseline. Under the 5-second 720P and 5-second 768P settings, OSP-Next achieves up to 1.64times single-GPU speedup and over 1.52times eight-GPU speedup on NVIDIA H200 GPUs. In addition, with only a 0.4% drop in VBench total score, OSP-Next-HiF8 achieves 1.69times and 2.27times speedups under the two settings on a single Ascend 950PR, demonstrating the efficiency and performance of OSP-Next across hardware platforms.

HuggingFace Daily Papers（社区热门论文）

61导出 Markdown

OSP-Next：一种高效的高质量视频生成模型

2026-05-27 08:00·37天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译