# Reason， then Re-reason：跨视角回顾提升空间推理

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-10 08:00
- AIHOT 分数：64
- AIHOT 链接：https://aihot.virxact.com/items/cmq937l3v087fslld7esskesb
- 原文链接：https://arxiv.org/abs/2606.11683

## AI 摘要

针对第一人称视频空间推理中观测证据受限的问题，现有单次推理方法依赖语义先验无法解决几何歧义。论文提出无训练推理时框架ReRe：推理阶段MLLM从原始视频形成空间假设；重推理阶段通过观察合成的新视角视频验证或修正假设。采用Geometry-to-Video流水线，从预测3D几何渲染抬高斜视的全景新视角，保留MLLM原生视频接口。在VSI-Bench和STI-Bench上，ReRe显著提升开源MLLM性能，匹敌专有模型最优水平。

## 正文

Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue that spatial reasoning should be revisitable: conclusions formed under limited evidence should remain open to revision when complementary viewpoints become available. Building on this insight, we propose Reason, then Re-reason (ReRe), a training-free, inference-time framework with two phases: in the Reason Phase, an MLLM forms a spatial hypothesis from the original video; in the Re-reason Phase, it verifies or revises the hypothesis by observing a synthesized novel-view video. To enable effective cross-view revisiting, we design a Geometry-to-Video pipeline that renders strategically complementary novel views from predicted 3D geometry. These views feature an elevated, oblique perspective with scene-spanning coverage, while preserving the MLLM's native video interface without architectural modifications. Extensive evaluations on VSI-Bench and STI-Bench demonstrate that ReRe substantially boosts open-source MLLMs to rival proprietary state-of-the-art performance. Project page: https://zhenjiemao.github.io/ReRe/