# 空间基础模型基准测试 SpatialBench：你的模型是全能选手吗？

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-26 08:00
- AIHOT 分数：68
- AIHOT 链接：https://aihot.virxact.com/items/cmpnfrc8n0x77sl017btp3eu3
- 原文链接：https://arxiv.org/abs/2605.27367

## AI 摘要

空间基础模型虽在标准数据集上表现优异，但其在不同任务、视角、场景、输入密度和硬件下的真实泛化能力尚未得到全面评估。为此，研究者提出了跨范式、多领域的基准测试 SpatialBench，包含19个数据集、546个场景，覆盖5个空间领域。该基准对41个模型在6种范式和4种输入密度下进行了评估，发现当前模型尚未达到“全能”水平。研究表明，全上下文注意力能最大化精度，有界内存策略可提升长序列扩展能力，且在具身任务中，严格的领域对齐与数据质量远比单纯增加数据量更重要。此外，研究还引入了大规模数据集 DA-Next-5M 及强基线模型 DA-Next。

## 正文

While spatial foundation models have demonstrated impressive performance on standard datasets, a critical question remains: are they truly all-round players capable of generalizing robustly across diverse downstream tasks, arbitrary viewpoints, shifting scene domains, varying input densities, and specific hardware constraints? Answering this overarching question requires a holistic assessment, yet current models are mainly evaluated on specific domains for which they were specifically designed or trained. Such evaluations are intrinsically limited by narrow paradigm coverage, limited scene domains, and arbitrary frame sampling, making it fundamentally difficult to assess their true generalization capabilities. To address this gap, we present SpatialBench, a cross-paradigm, domain-diverse benchmark for spatial foundation models with deterministic sampling. SpatialBench features unprecedented scale and rigorous deterministic design, comprising 19 datasets and 546 scenes across 5 diverse spatial domains. It comprehensively evaluates 41 models across 6 paradigms on 5 task suites under 4 different input density settings. Our extensive evaluation reveals that current models are not yet all-round players, and uncovers crucial insights for future advancement. Specifically, we demonstrate that full-context attention maximizes accuracy while bounded-memory strategies unlock long-sequence scalability. Moreover, our empirical evaluations in challenging embodied and egocentric tasks demonstrate that strict domain alignment and high data quality are far more critical to performance than simple dataset scaling. Furthermore, to address the largest data gap identified in our analysis, we go beyond evaluation by introducing a large-scale dataset, DA-Next-5M, and a strong baseline model, DA-Next, pushing the boundaries of spatial representation learning.
