# SceneAligner：基于3D重建的平面图定位方法

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-21 08:00
- AIHOT 分数：48
- AIHOT 链接：https://aihot.virxact.com/items/cmpgeo9hw0esksljwpp06pywd
- 原文链接：https://arxiv.org/abs/2605.22581

## AI 摘要

SceneAligner是一种基于3D重建的平面图定位方法。针对现有技术对环境规模和地图格式的限制，该方法从无约束图像集合重建重力对齐的3D场景，并投影为2D密度图作为平面图代理。通过2D相似变换实现与输入平面图的对齐。为克服密度图与建筑平面图之间的视觉差异，引入跨模态学习机制，利用2D基础模型进行语义对齐，同时保持结构一致性。实验结果显示，该方法在多种场景中显著优于先前方法，特别是在极稀疏输入（如仅单张图像）时仍能有效工作。代码和数据将公开，以促进进一步研究。

## 正文

Many public buildings provide floorplans with a "you are here" indicator to help visitors orient themselves. Floorplan localization seeks to computationally replicate this capability by determining where visual observations were captured within a floorplan. However, existing methods typically assume controlled small-scale environments and precise vectorized floorplans, limiting their ability to operate in large-scale buildings and rasterized floorplans. In this work, we present an approach for performing floorplan localization in the wild by grounding the task in a reconstructed 3D representation of the scene. Given an unconstrained image collection, our method reconstructs a gravity-aligned 3D scene and projects it into a 2D density map that serves as a floorplan proxy. Floorplan localization is then formulated as aligning this proxy with the input floorplan via a 2D similarity transform. To bridge the appearance gap between density maps and architectural floorplans, we adapt a 2D foundation model to learn cross-modal correspondences, introducing a fine-tuning scheme that encourages semantically aligned matches while preserving structural consistency. Extensive experiments demonstrate substantial improvements over prior methods, including in extremely sparse settings with as little as a single input image. Our code and data will be publicly available.
