# 世界价值模型（WVM）用于机器人操作

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-23 08:00
- AIHOT 分数：47
- AIHOT 链接：https://aihot.virxact.com/items/cmqrh5dvw0k28slp53fg3z2pm
- 原文链接：https://arxiv.org/abs/2606.24742

## AI 摘要

现有机器人价值模型基于缺乏时间建模能力的VLM骨干。世界模型擅长时间建模与未来规划，由此将世界模型与价值估计结合，构建了世界价值模型（WVM）。WVM在标准基准上取得SOTA的价值序相关（VOC）结果。为补充仅含专家数据的评测，新引入Suboptimal-Value-Bench（含800条次优轨迹及人工标注），WVM同样保持SOTA。在策略学习中，WVM在模拟和真实部署中提升了多种策略提取方法的操作性能。

## 正文

Generalist value models play a pivotal role in scaling robotic policy learning from large-scale, mixed-quality data. Mathematically, accurate value estimation demands deep temporal understanding, requiring models to both ground the current belief using historical context and plan over future outcomes. However, most existing robotic value models are built on Vision-Language Model (VLM) backbones that are pretrained primarily on static or temporally sparse visual observations, lacking the requisite temporal modeling capabilities for value estimation. Unlike VLMs, world models naturally excel at temporal modeling and future planning, making them ideal foundations for learning generalizable value functions. Driven by this insight, we marry world models with value estimation to construct a new generalist robotic value model, World Value Model (WVM), that offers accurate task progressions to assess data quality. On standard benchmarks, WVM delivers state-of-the-art (SOTA) Value-Order Correlation (VOC) results. Complementing standard evaluation suites that contains only expert data, we further introduce Suboptimal-Value-Bench, a multi-embodiment benchmark consisting of 800 suboptimal trajectories with high-fidelity, human-labeled frame annotations. Our evaluations show that WVM maintains its SOTA performance on Suboptimal-Value-Bench, establishing its robustness in handling both expert and suboptimal data. When deployed for policy learning, WVM improves manipulation performance across various policy extraction approaches in both simulated and real-world deployment, providing robust guidance for learning from mixed-quality data.
