# 空间能力基准测试 SCBench

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-03-05 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo0219nc004ssloz8ppv3cr3
- 原文链接：https://arxiv.org/abs/2604.09594

## AI 摘要

研究团队发布空间能力基准测试 SCBench，突破现有评估仅针对孤立 3D 变换或视觉问答的局限，设置三个层次化能力维度，要求模型输出可执行动作并通过确定性检查器或模拟器验证。测试显示，三款前沿模型准确率随任务难度提升而单调下降；限制输出 token 数量发现，准确率提升集中在低预算区间且快速饱和，主要失败模式为局部几何合理但违反全局约束。团队已开源任务生成器、验证器及可视化工具。

## 正文

Spatial competence is the quality of maintaining a consistent internal representation of an environment and using it to infer discrete structure and plan actions under constraints. Prevailing spatial evaluations for large models are limited to probing isolated primitives through 3D transformations or visual question answering. We introduce the Spatial Competence Benchmark (SCBench), spanning three hierarchical capability buckets whose tasks require executable outputs verified by deterministic checkers or simulator-based evaluators. On SCBench, three frontier models exhibit monotonically decreasing accuracy up the capability ladder. Sweeping output-token caps shows that accuracy gains concentrate at low budgets and saturate quickly, and failures are dominated by locally plausible geometry that breaks global constraints. We release the task generators, verifiers, and visualisation tooling.