# ROSE：面向检索的分割增强

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-15 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo0vpnos02w2sli279v1hob0
- 原文链接：https://arxiv.org/abs/2604.14147

## AI 摘要

研究团队提出即插即用框架ROSE，通过引入互联网检索增强生成、文本与视觉提示增强及WebSense智能调度四大模块，解决多模态大语言模型在分割训练数据外新颖实体与需实时信息新兴实体时的知识滞后问题。同步构建的NEST基准测试用于评估此类场景。实验显示，ROSE在NEST基准上较Gemini-2.0 Flash检索基线提升19.2 gIoU，显著增强模型对实时网络信息的利用能力。

## 正文

Existing segmentation models based on multimodal large language models (MLLMs), such as LISA, often struggle with novel or emerging entities due to their inability to incorporate up-to-date knowledge. To address this challenge, we introduce the Novel Emerging Segmentation Task (NEST), which focuses on segmenting (i) novel entities that MLLMs fail to recognize due to their absence from training data, and (ii) emerging entities that exist within the model's knowledge but demand up-to-date external information for accurate recognition. To support the study of NEST, we construct a NEST benchmark using an automated pipeline that generates news-related data samples for comprehensive evaluation. Additionally, we propose ROSE: Retrieval-Oriented Segmentation Enhancement, a plug-and-play framework designed to augment any MLLM-based segmentation model. ROSE comprises four key components. First, an Internet Retrieval-Augmented Generation module is introduced to employ user-provided multimodal inputs to retrieve real-time web information. Then, a Textual Prompt Enhancer enriches the model with up-to-date information and rich background knowledge, improving the model's perception ability for emerging entities. Furthermore, a Visual Prompt Enhancer is proposed to compensate for MLLMs' lack of exposure to novel entities by leveraging internet-sourced images. To maintain efficiency, a WebSense module is introduced to intelligently decide when to invoke retrieval mechanisms based on user input. Experimental results demonstrate that ROSE significantly boosts performance on the NEST benchmark, outperforming a strong Gemini-2.0 Flash-based retrieval baseline by 19.2 in gIoU.
