Vesta：通用具身推理模型

2026-06-18 08:00·15天前

AI 摘要

Vesta是一个统一的具身通用基础模型，将定位、空间推理、导航和长期规划能力整合于单一模型。通过大规模空间感知数据集和简单的多模态记忆机制，Vesta在多种基准测试中平均超过单个SOTA基线20%以上，并优于按类别最佳基线集成的结果10%以上。在需要记忆与推理的真实机器人任务中，Vesta将任务成功率提升35%以上，表明单一通用模型在可行性和可扩展性上优于多模型组合方案。

原文 · 未翻译

Robots operating in open-world environments must seamlessly integrate localization, spatial reasoning, navigation, and long-horizon planning. While specialist models excel at individual tasks, deploying a multi-model stack is computationally expensive and prone to cascading errors. We present Vesta, a unified embodied generalist that consolidates these capabilities into a single foundation model. Our approach combines a diverse and massive curated corpus designed to induce spatial grounding and a simple multimodal memory harness that enables reasoning over extended time horizons. Across diverse benchmarks, Vesta on average beats individual SOTA baselines by >20% and beats an ensemble of per-category-best baselines by >10% -- thus demonstrating that a generalist model can match or exceed specialists. On real-world robotic tasks requiring memory and reasoning, Vesta improves task success by >35\%. Our work thus demonstrates that a single generalist is a feasible, scalable, and arguably preferable alternative to combining specialists.

HuggingFace Daily Papers（社区热门论文）

46导出 Markdown

Vesta：通用具身推理模型

2026-06-18 08:00·15天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译