# MolmoMotion：基于语言指令的3D点轨迹预测模型

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-17 08:00
- AIHOT 分数：49
- AIHOT 链接：https://aihot.virxact.com/items/cmqjjxug90415slmh8zo23lzx
- 原文链接：https://arxiv.org/abs/2606.18558

## AI 摘要

MolmoMotion将运动预测形式化为目标条件的3D点运动预测：给定短视觉历史、物体上的3D查询点集和语言描述的目标，预测每个点的未来3D轨迹。研究包含三个组件：MolmoMotion-1M数据集（从116万段无约束视频中标注动作描述和3D点轨迹）、PointMotionBench人工验证基准（覆盖111类物体和61种运动类型）以及MolmoMotion模型（支持自回归坐标预测和流匹配轨迹生成）。该模型能根据语言指令预测多样化运动，在基准上显著超越现有方法，且学到的3D运动先验可迁移至机器人操作和视频生成。

## 正文

Motion forecasting is central to visual intelligence: agents must anticipate how objects will move in order to plan actions, reason about physical interactions, and synthesize realistic futures. We argue that 3D points in world coordinates provide a general representation that is class-agnostic, view-stable, compact, and directly useful for downstream tasks. We formalize the task of goal-conditioned 3D point motion forecasting: given a short visual history, a set of 3D query points on an object of interest, and a language description of the intended goal, the model predicts the future 3D trajectory of each point. We introduce a full stack to study this task at scale: (1) MolmoMotion-1M is a large corpus of action-described, object-grounded 3D point trajectories annotated from 1.16M unconstrained videos; (2) PointMotionBench is a human-verified benchmark spanning 111 object categories and 61 motion types; and (3) MolmoMotion is a general motion forecasting model that supports both autoregressive coordinate prediction and flow-matching-based trajectory generation. MolmoMotion accurately predicts diverse motion patterns with different language instructions, and significantly outperforms existing motion prediction baselines on PointMotionBench. Finally, we show that the learned 3D motion prior transfers well to downstream applications: it improves training efficiency and generalization for robot manipulation, and its predicted trajectories provide effective motion guidance for generative models to synthesize videos with more realistic object motion.
