你的嵌入模型,比你想象的更SMART
阅读原文· arxiv.orgSMART是一个框架,旨在解锁标准单向量嵌入模型的潜在多向量能力。它通过在推理时对标准对比训练后冻结的隐藏状态应用直接后期交互,实现即插即用的性能提升。研究表明,SMART能提升包括最先进模型在内的多模态检索性能,在MMEB-V2上进一步改善了效果。简单的轻量级后训练不仅节省时间和算力,还能在视觉文档检索任务上使单向量模型超越当前最强大的多向量模型。该项目代码和权重已在GitHub开源。
Multimodal retrieval relies heavily on single-vector retrievers, which compress rich, sequential token sequences into one single global representation. While efficient, they discard fine-grained, local evidence critical for dense retrieval tasks. Multi-vector approaches were introduced as a solution, but they strictly require training and many ignore the necessity of a globally summarizing representation. To address this, we introduce SMART, a framework that unlocks the latent multi-vector capabilities of standard single-vector models. We first demonstrate that standard contrastive training on the pooled embedding implicitly shapes the retrieval geometry of preceding hidden states via gradient flow. By applying direct late-interaction over these frozen hidden states during inference, SMART acts as a plug-and-play upgrade that consistently improves performance across diverse modalities, improving even the state-of-the-art models further on MMEB-V2. We also reveal SMART's superior performance, as simple lightweight post-training not only saves time and compute, but also brings forth further improvement on Visual Document retrieval, allowing a single-vector model to outperform SoTA multi-vector counterparts. Ultimately, SMART offers both a highly efficient inference enhancement and a powerful finetuning technique for multimodal retrieval. We open source our code and weights at https://github.com/HanSolo9682/SMART.