密集可验证奖励框架 DR-MV3D：全局地图与局部视图驱动的多视角 3D 推理

2026-06-22 08:00·11天前

AI 摘要

多视角 3D 视觉问答（MV3D-VQA）需整合局部观测为 3D 场景并规划信息视角。现有多模态大模型仅用答案级稀疏监督，导致跨视角推理不一致。DR-MV3D 提出地图级密集可验证奖励框架，将任务分解为异心全局地图构建、问题条件化视角轨迹规划、自我中心定位回答预测。引入全局一致性奖励（利用冻结 3D 视觉基础模型 VGGT 和 SAM3 对齐预测地图）和局部轨迹奖励（监督有序视角选择），并通过轨迹级策略优化（GRPO）训练全流程。在 MindCube、VSI-Bench 和 BLINK 上优于强多图像基线，验证过程级密集监督的有效性。

原文 · 未翻译

Multi-view 3D Visual Question Answering (MV3D-VQA) requires integrating partial observations into a coherent 3D scene representation and selecting informative viewpoints for multi-step spatial reasoning. However, current multimodal LLMs are typically trained with sparse, answer-level supervision, which often yields inconsistent cross-view reasoning and brittle view selection. We present DR-MV3D (Dense Reward for MV3D-VQA), a map-grounded learning framework that provides dense, verifiable rewards to supervise the reasoning process. Our approach decomposes MV3D-VQA into (i) allocentric global map construction, (ii) question-conditioned view-trajectory planning, and (iii) egocentric grounding for answer prediction. To make intermediate steps learnable without manual annotations, we introduce two rewards: a global consistency reward that aligns the predicted map with geometry-consistent pseudo targets from frozen 3D vision foundation models (e.g., VGGT + SAM3), and a local trajectory reward that supervises ordered viewpoint selection. We optimize the full pipeline with trajectory-level policy optimization (GRPO). Experiments on MindCube, VSI-Bench, and BLINK (MV) show that DR-MV3D consistently improves over strong multi-image baselines, supporting the effectiveness of process-level dense supervision for multi-view 3D reasoning.

HuggingFace Daily Papers（社区热门论文）

46导出 Markdown

密集可验证奖励框架 DR-MV3D：全局地图与局部视图驱动的多视角 3D 推理

2026-06-22 08:00·11天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译