# 几何 matters： 3D基础先验用于学习语义对应

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-28 08:00
- AIHOT 分数：54
- AIHOT 链接：https://aihot.virxact.com/items/cmpqnvse606ojslnovxcbbgy5
- 原文链接：https://arxiv.org/abs/2605.30093

## AI 摘要

自监督视觉模型与扩散模型提取的2D基础特征在语义对应任务中有效，但缺乏显式3D意识，易混淆对称物体的两侧、重复部件及视觉相似结构。新框架引入3D基础模型先验，使用SAM3D估计物体几何与位姿，并通过渲染比较优化进行细化。随后，基于估计位姿将PartField描述符从重建几何渲染至图像平面，生成几何感知特征图以补充DINO与Stable Diffusion特征，同时利用重建形状上的测地距离可靠过滤候选对应。该方法以过滤后的匹配为监督，训练一个轻量级适配器。与以往依赖位姿标注和粗略几何的后训练方法不同，此框架自动获取实例级3D结构并用于指导对应学习。实验表明，该方法在减少人工几何监督的同时提升了语义对应性能。

## 正文

Foundation features from self-supervised vision models and text-to-image diffusion models have proven effective for semantic correspondence estimation. However, because these features are learned primarily from 2D image objectives, they lack explicit 3D awareness and often confuse symmetric object sides, repeated parts, and visually similar structures that are distinct in 3D. We introduce a 3D-aware post-training framework that goes beyond available 2D foundation features by incorporating priors from 3D foundation models. Given an image, our method uses SAM3D to estimate object geometry and pose, and refines the pose through render-and-compare optimization. Subsequently, we render PartField descriptors from the reconstructed geometry into the image plane based on the estimated object pose. The resulting geometry-aware feature maps complement DINO and Stable Diffusion features, while geodesic distances on the reconstructed shapes enable reliable filtering of candidate correspondences. We use the filtered matches as supervision to train a lightweight adapter on top of DINO and Stable Diffusion for semantic correspondence. In contrast to prior post-training approaches that require pose annotations and rely on coarse spherical geometry, our method automatically obtains instance-specific 3D structure and uses it to guide correspondence learning. Experiments show that our approach improves semantic correspondence over the prior methods while reducing manual geometric supervision. Code and model can be found at https:/github.com/GenIntel/3D-SC.