AlloSpatial：基础模型中的异中心空间推理智能体框架

2026-06-08 08:00·25天前

AI 摘要

多模态基础模型因无法将自我中心观察转化为全局异中心空间表征，导致物理世界空间推理脆弱。AlloSpatial引入World2Mind认知映射沙盒，将观察转换为异中心空间树（ASTs）和路线图，支持查询对象拓扑、几何关系等。通过Spatial Reasoning Harness进行工具使用判断和几何-语义仲裁，并利用冷启动强化学习内化至Qwen3-VL。在VSI-Bench和MindCube上，无需训练提升专有模型5%-18%；仅ASTs即使无视觉输入也支撑强推理；训练后智能体超越更大通用模型与竞争基线。

原文 · 未翻译

Multimodal Foundation Models (MFMs) have made substantial progress, yet remain fragile in spatial reasoning over the physical world. A key bottleneck lies in their inability to transform local egocentric observations into a global allocentric spatial representation. To address this, we propose AlloSpatial, an agentic framework for allocentric spatial cognition in foundation models. AlloSpatial introduces World2Mind, a plug-and-play cognitive mapping sandbox that converts egocentric observations into structured allocentric priors, including Allocentric-Spatial Trees and route maps that support querying object topology, geometric relations, passability, and trajectories. To utilize these priors reliably under noisy reconstruction and ambiguous visual evidence, AlloSpatial introduces a Spatial Reasoning Harness for tool-use judgment, modality-decoupled cue collection, and geometry-semantic arbitration. We further internalize this process in Qwen3-VL through cold-start reinforcement learning with a harness-gated trajectory-level reward. Experiments on VSI-Bench and MindCube show that AlloSpatial improves proprietary models by 5%-18% in a training-free setting, while ASTs alone support strong spatial reasoning even when visual inputs are removed. The trained AlloSpatial agents further outperform larger general-purpose models and competitive spatial baselines, suggesting that structured allocentric representations, active tool use, and verifiable reasoning offer a promising route toward spatially capable foundation models.

HuggingFace Daily Papers（社区热门论文）

50导出 Markdown

AlloSpatial：基础模型中的异中心空间推理智能体框架

2026-06-08 08:00·25天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译