Discrete-WAM：统一离散视觉-动作Token编辑用于世界-策略学习

2026-06-04 08:00·29天前

AI 摘要

自动驾驶需推理自车动作如何影响世界演化，现有端到端方法依赖直接状态-动作映射，缺乏对动作条件动力学的显式建模；连续潜空间世界模型缺乏组合因果推理。Discrete-WAM提出统一潜视觉-动作世界策略，将未来视觉状态与自车动作表示为对齐的离散token，在离散扩散框架内联合实现世界建模、世界-动作策略和层级决策策略，支持跨替代未来的组合因果推理与可控生成。在大规模自动驾驶基准上取得有竞争力的性能。

原文 · 未翻译

Autonomous driving requires reasoning about how ego actions shape the evolution of the surrounding world. However, most end-to-end methods rely on direct state-to-action mappings, capturing correlations without explicitly modeling action-conditioned dynamics. Conversely, continuous-latent world models often lack compositional structure for causal reasoning across counterfactual futures. We introduce Discrete-WAM, a unified latent vision-action world policy that represents future visual states and ego actions as aligned discrete tokens, enabling compositional causal reasoning across alternative futures. Built upon this unified discrete alignment, Discrete-WAM establishes a shared discrete diffusion framework with unified generative tasks, jointly formulating world modeling, world-action policy, and hierarchical decision-enabled policy, supporting compositional generalization across diverse driving scenarios. Experiments on large-scale autonomous-driving benchmarks show that Discrete-WAM achieves competitive performance while supporting controllable generation and counterfactual reasoning, offering a principled path toward more reliable decision-making.

HuggingFace Daily Papers（社区热门论文）

65导出 Markdown

Discrete-WAM：统一离散视觉-动作Token编辑用于世界-策略学习

2026-06-04 08:00·29天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译