# SAM2Matting：通用图像和视频抠图

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-25 08:00
- AIHOT 分数：47
- AIHOT 链接：https://aihot.virxact.com/items/cmr0svezl04s5slolmv4pu6ha
- 原文链接：https://arxiv.org/abs/2606.27339

## AI 摘要

SAM2Matting 是一种追踪器到抠图的框架，通过为基础追踪器（如 SAM2、SAM3）添加区域提议桥和专用抠图头，将视频对象分割追踪器扩展为高保真视频抠图系统。它解耦了高层时序理解与底层细粒度细节处理。尽管仅使用图像训练，SAM2Matting 在视频抠图上实现了新 SOTA，支持多种提示类型，保持强时间一致性，并在人物及野外场景中展现出鲁棒的泛化能力。

## 正文

Despite impressive advances in image matting, video matting remains challenging due to the inherent gap between high-level tracking, which requires frame-wise understanding, and low-level matting, which focuses on extremely fine-grained details. Existing methods attempt this with expensive and narrowly-scoped video matting datasets, which may limit out-of-domain generalization and compromise tracking robustness. We rethink the paradigm with SAM2Matting, a tracker-to-matting framework that advances VOS trackers to high-fidelity video matting. Specifically, it decouples the task by enhancing a foundational tracker (e.g., SAM2, SAM3) with a region-proposal bridge and dedicated matting heads, enabling the uncompromised tracker to handle temporal consistency while the matting components resolve fine-grained details. Notably, despite being trained only on images, SAM2Matting establishes new state-of-the-art performance on video matting, supports diverse prompt types, maintains strong temporal consistency, and demonstrates robust generalization across both human-centric and in-the-wild scenarios.