Fast LeWorldModel
阅读原文· arxiv.orgFast-LeWM是一种快速潜空间世界模型,基于JEPA和LeWM。它用动作前缀预测替代LeWM逐次单步潜状态展开:将候选动作序列的前缀编码后并行预测对应未来潜状态。前缀级监督使模型学习不同前缀下状态的连续演化,规划时可直接利用最后一个前缀token评估未来潜状态,无需逐一遍历中间想象状态。在多个任务上,Fast-LeWM相比LeWM提高了平均成功率,大幅缩短了规划时间,并实现了随展开步长增长显著变慢的开环潜损失。
Joint-Embedding Predictive Architectures (JEPAs), including recent LeWorldModel (LeWM), have become a promising foundation for reconstruction-free visual world models. For visual planning, however, LeWM evaluates candidate action sequences by repeatedly applying a local one-step latent transition model. This autoregressive rollout makes planning computationally expensive and exposes the predicted trajectory to accumulated latent errors as the horizon grows. We propose Fast LeWorldModel (Fast-LeWM), a fast latent world model that replaces repeated local rollout with action-prefix prediction. Given the current latent and a candidate action sequence, Fast-LeWM encodes its prefixes and predicts the future latents reached after executing those prefixes in parallel. By making action prefixes the basic prediction unit, Fast-LeWM directly models action effects accumulated to different extents over multiple horizons. This prefix-level supervision forces the model to learn how states continuously evolve under different action prefixes, rather than only fitting one-step state transitions. During planning, the predictor can use the last prefix token from the encoded action sequence to evaluate the corresponding future latent without explicitly rolling through each intermediate imagined state. Across multiple tasks, Fast-LeWM improves average success over LeWM while substantially reducing planning time, achieving lower open-loop latent loss whose growth becomes significantly slower as the rollout horizon increases.