# Discrete-WAM：统一离散视觉-动作Token编辑用于世界-策略学习

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-04 08:00
- AIHOT 分数：65
- AIHOT 链接：https://aihot.virxact.com/items/cmq0bomkt04pesltr0kex4i5p
- 原文链接：https://arxiv.org/abs/2606.05645

## AI 摘要

自动驾驶需推理自车动作如何影响世界演化，现有端到端方法依赖直接状态-动作映射，缺乏对动作条件动力学的显式建模；连续潜空间世界模型缺乏组合因果推理。Discrete-WAM提出统一潜视觉-动作世界策略，将未来视觉状态与自车动作表示为对齐的离散token，在离散扩散框架内联合实现世界建模、世界-动作策略和层级决策策略，支持跨替代未来的组合因果推理与可控生成。在大规模自动驾驶基准上取得有竞争力的性能。

## 正文

Autonomous driving requires reasoning about how ego actions shape the evolution of the surrounding world. However, most end-to-end methods rely on direct state-to-action mappings, capturing correlations without explicitly modeling action-conditioned dynamics. Conversely, continuous-latent world models often lack compositional structure for causal reasoning across counterfactual futures. We introduce Discrete-WAM, a unified latent vision-action world policy that represents future visual states and ego actions as aligned discrete tokens, enabling compositional causal reasoning across alternative futures. Built upon this unified discrete alignment, Discrete-WAM establishes a shared discrete diffusion framework with unified generative tasks, jointly formulating world modeling, world-action policy, and hierarchical decision-enabled policy, supporting compositional generalization across diverse driving scenarios. Experiments on large-scale autonomous-driving benchmarks show that Discrete-WAM achieves competitive performance while supporting controllable generation and counterfactual reasoning, offering a principled path toward more reliable decision-making.