# Vesta：通用具身推理模型

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-18 08:00
- AIHOT 分数：46
- AIHOT 链接：https://aihot.virxact.com/items/cmqzujm9t017pslkihe6r4jku
- 原文链接：https://arxiv.org/abs/2606.20905

## AI 摘要

Vesta是一个统一的具身通用基础模型，将定位、空间推理、导航和长期规划能力整合于单一模型。通过大规模空间感知数据集和简单的多模态记忆机制，Vesta在多种基准测试中平均超过单个SOTA基线20%以上，并优于按类别最佳基线集成的结果10%以上。在需要记忆与推理的真实机器人任务中，Vesta将任务成功率提升35%以上，表明单一通用模型在可行性和可扩展性上优于多模型组合方案。

## 正文

Robots operating in open-world environments must seamlessly integrate localization, spatial reasoning, navigation, and long-horizon planning. While specialist models excel at individual tasks, deploying a multi-model stack is computationally expensive and prone to cascading errors. We present Vesta, a unified embodied generalist that consolidates these capabilities into a single foundation model. Our approach combines a diverse and massive curated corpus designed to induce spatial grounding and a simple multimodal memory harness that enables reasoning over extended time horizons. Across diverse benchmarks, Vesta on average beats individual SOTA baselines by >20% and beats an ensemble of per-category-best baselines by >10% -- thus demonstrating that a generalist model can match or exceed specialists. On real-world robotic tasks requiring memory and reasoning, Vesta improves task success by >35\%. Our work thus demonstrates that a single generalist is a feasible, scalable, and arguably preferable alternative to combining specialists.
