触而知见:触觉驱动的材料区域视觉定位
阅读原文· arxiv.org针对触觉定位任务,本文提出通过密集跨模态特征交互学习局部视觉-触觉对齐的模型,生成触觉显著性图实现触摸条件材料分割。为克服现有数据集特写镜头单一、多样性不足的限制,研究引入野外多材料场景图像及材料多样性配对策略,将触觉样本与视觉多样但触觉一致的图像对齐以增强上下文定位能力。此外构建了两个新的触觉材料分割数据集用于定量评估,实验结果表明该方法显著优于现有视觉-触觉方法。
We address the problem of tactile localization, where the goal is to identify image regions that share the same material properties as a tactile input. Existing visuo-tactile methods rely on global alignment and thus fail to capture the fine-grained local correspondences required for this task. The challenge is amplified by existing datasets, which predominantly contain close-up, low-diversity images. We propose a model that learns local visuo-tactile alignment via dense cross-modal feature interactions, producing tactile saliency maps for touch-conditioned material segmentation. To overcome dataset constraints, we introduce: (i) in-the-wild multi-material scene images that expand visual diversity, and (ii) a material-diversity pairing strategy that aligns each tactile sample with visually varied yet tactilely consistent images, improving contextual localization and robustness to weak signals. We also construct two new tactile-grounded material segmentation datasets for quantitative evaluation. Experiments on both new and existing benchmarks show that our approach substantially outperforms prior visuo-tactile methods in tactile localization.