# Light Interaction： 交互式视频世界模型的免训练推理加速

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-29 08:00
- AIHOT 分数：63
- AIHOT 链接：https://aihot.virxact.com/items/cmpulkg6204v9slagnmtb6tz4
- 原文链接：https://arxiv.org/abs/2605.31158

## AI 摘要

Light Interaction是一个用于交互式视频世界模型的免训练推理加速框架。其核心是利用交互特性实现轨迹依赖的自适应计算，具体包括自适应上下文管理、去噪缓存加速以及硬件软件协同设计的3D块稀疏注意力。在HY-WorldPlay和Matrix-Game-3.0上的评估表明，该框架无需重新训练模型，可实现最高2.59倍的推理加速，同时保持有竞争力的视觉质量。

## 正文

Interactive video world models generate video chunk by chunk in response to user-controlled camera movements, enabling applications such as real-time game simulation, virtual scene navigation, and embodied AI training. However, scaling to long interactive trajectories is prohibitively expensive due to growing context memory, quadratic attention complexity, and repeated denoising steps. We present Light Interaction, a training-free inference acceleration framework for interactive video world models. Our key insight is that interaction naturally enables trajectory-dependent adaptive computation: retrieved spatial memory can be discarded during novel exploration, temporal context can be adjusted according to local latent dynamics, and early-step model outputs can be reused when the camera revisits familiar regions. Based on this insight, Light Interaction combines adaptive context management, denoising cache acceleration, and hardware-software co-designed 3D block sparse attention with fused Triton kernels. Evaluated on HY-WorldPlay and Matrix-Game-3.0, Light Interaction achieves up to 2.59x speedup without model retraining while maintaining competitive visual quality.
