# PhysiFormer： 世界坐标中的扩散 Transformer 模拟物理可信 3D 物体运动

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-25 08:00
- AIHOT 分数：51
- AIHOT 链接：https://aihot.virxact.com/items/cmqwqyy1k0072slikpgyocg2n
- 原文链接：https://arxiv.org/abs/2606.27364

## AI 摘要

PhysiFormer 是一种扩散 Transformer 模型，用于物理可信的 3D 物体运动模拟。它将物体表示为世界坐标下的 3D 网格，输入初始顶点位置、速度及材料类型（刚性或弹性），通过去噪扩散过程直接采样未来顶点轨迹，不依赖显式归纳偏置。概率性公式捕捉动力学不确定性，生成多种合理未来。模型在时间、空间和物体维度上分解注意力，实现置换不变的多物体推理。基于 10 万+模拟轨迹训练，可生成刚体和弹性力学，并泛化至混合材料、未见真实几何及更多物体场景，在轨迹精度、刚性保持和动量一致性上显著优于自回归基线。

## 正文

We present PhysiFormer, a diffusion transformer for physically-plausible 3D object motion. Unlike video world models that operate in view-dependent pixel space, PhysiFormer represents objects as 3D meshes expressed in world coordinates. Given the initial vertex positions and velocities, as well as object material type, rigid or elastic, the model samples future vertex trajectories. While related neural physics approaches build on ad-hoc latent spaces or explicitly enforce rigidity and causality, PhysiFormer shows that excellent results can be obtained without any such inductive biases, by casting vertex trajectory prediction as a single denoising diffusion process directly in world coordinates. The probabilistic formulation captures uncertainty in the learned dynamics, enabling diverse plausible futures from initial conditions, making this framework potentially useful for applications with unobserved uncertainty. The model features attention factorised over time, space, and objects for efficiency, enabling permutation-invariant multi-object reasoning without needing explicit object encoding. Trained on over 100k simulated trajectories, PhysiFormer generates rigid and elastic mechanics, and generalises to mixed-material settings, unseen real-world geometries, and larger object counts. It substantially outperforms autoregressive baselines in trajectory accuracy, rigidity preservation, and momentum-based physical consistency. Our results position coordinate-space diffusion as a promising step toward view-invariant, geometry-aware world modelling for robotics, graphics, and physical design. Visualisations, code, and models are available at https://yimingc9.github.io/physiformer.
