# GRAIL：基于3D资产与视频先验的人形机器人全身操控数据生成管线

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-03 08:00
- AIHOT 分数：65
- AIHOT 链接：https://aihot.virxact.com/items/cmpyw41ob03vwsli3d928nqic
- 原文链接：https://arxiv.org/abs/2606.05160

## AI 摘要

GRAIL是一个全虚拟数字生成管线，利用3D资产、仿真就绪场景和视频基础模型先验，合成人形机器人交互数据，无需物理环境重建或遥操作。管线在视频生成前已知物体几何、相机参数、度量尺度、环境深度和机器人比例角色，从而更好地约束4D重建，通过基于模型的物体跟踪、人体运动估计和交互感知优化，恢复度量4D人-物交互轨迹。GRAIL生成超过20,000个序列，涵盖拾取、物体操作、坐着和地形穿越。仅使用GRAIL数据训练的自我中心视觉策略，通过仿真到真实迁移部署到宇树G1人形机器人，实现了84%物体拾取成功率和90%爬楼梯成功率。

## 正文

Scaling humanoid loco-manipulation requires robot-compatible demonstrations across diverse objects, whole-body motions, and scene geometries, but teleoperation and motion capture are difficult to scale because each collection depends on physical setups, instrumented actors, and robot operation. We present GRAIL, a digital generation pipeline that remains fully virtual until deployment: it composes 3D assets, simulator-ready scenes, and priors from video foundation models (VFMs) to synthesize interactions without rebuilding physical environments or teleoperating the robot. Rather than reconstructing unconstrained in-the-wild videos, GRAIL starts from fully specified 3D configurations in which object geometry, camera parameters, metric scale, environment depth, and a robot-proportioned character are known before video generation and reused during reconstruction. This privileged setup better conditions 4D recovery, allowing model-based object tracking, human motion estimation, and interaction-aware optimization to reconstruct metric 4D human-object interaction (HOI) trajectories with reduced depth ambiguity and morphology mismatch. We retarget the recovered motions to a humanoid robot and train complementary task-general trackers: an object-aware latent adaptor for manipulation and a scene-aware tracker for terrain traversal. GRAIL produces over 20,000 sequences spanning pick-up, object manipulation, sitting, and terrain traversal. Using only GRAIL-generated data, we train egocentric visual policies through a sim-to-real pipeline and deploy them on a Unitree G1 humanoid, achieving 84\% real-world success on diverse object pick-up and 90\% success on stair-climbing.