# A2World：从动作到世界建模学习可迁移的动力学先验

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-28 08:00
- AIHOT 分数：41
- AIHOT 链接：https://aihot.virxact.com/items/cmr0ducpf00y2slolrazo9ekg
- 原文链接：https://arxiv.org/abs/2606.29501

## AI 摘要

研究提出A2World，一个多视图交互基础扩散世界模型。通过在大规模机器人操作数据上预训练，学习将动作驱动的视觉演变建模为可迁移的动力学先验。预训练权重可适配两类模型：A2World-sim作为任务/场景专用模拟器，用于策略评估与假设分析；A2World-policy作为视频-动作联合预测模型，在视觉和指令条件下预测动作。实验表明，该预训练能为模拟器中心和策略中心的机器人学习提供可迁移的动力学先验。

## 正文

We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.