# KaLM-Reranker-V1：快速但非延迟交互的压缩文档重排序器

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-22 08:00
- AIHOT 分数：48
- AIHOT 链接：https://aihot.virxact.com/items/cmqq2kjqi060mslp58qty0ayl
- 原文链接：https://arxiv.org/abs/2606.22807

## AI 摘要

KaLM-Reranker-V1 是一种快速但非延迟交互（FBNL）重排序器，采用编码器-解码器架构。编码器使用 Matryoshka 嵌入池化预编码段落，解码器建模系统指令、用户指令和查询意图，再通过交叉注意力捕获查询与段落间的相关性，实现解耦计算并保持高效。模型提供 Nano（0.27B）、Small（1B）和 Large（4B）三种激活参数尺寸。在 BEIR 上达到 SOTA，与 Qwen3-Reranker 系列持平；在 MIRACL 上未经大量多语言训练仍表现优异；在 LMEB 上，0.27B Nano 模型可与 7-12B 嵌入模型竞争。

## 正文

As retrieval systems scale, high-quality reranking becomes increasingly important. However, most existing rerankers, whether encoder-based or decoder-based, jointly encode the query and passage, tightly coupling their computation and limiting deployment efficiency as well as flexibility. We present KaLM-Reranker-V1, a fast but not late-interaction (FBNL) reranker that decouples query and passage computation while retaining expressive relevance modeling. Built on an encoder-decoder architecture, KaLM-Reranker-V1 uses the encoder to pre-encode passages with Matryoshka embedding pooling, while the decoder models the system instruction, user instruction, and query intent; cross-attention then captures relevance between the query context and passage representations. This design makes KaLM-Reranker-V1 efficient through decoupled passage encoding, yet not late interaction, by preserving rich relevance modeling through cross-attention. We instantiate KaLM-Reranker-V1 in three sizes, Nano, Small, and Large, with 0.27B, 1B, and 4B activated parameters, respectively. Extensive experiments on BEIR, MIRACL, and LMEB demonstrate that KaLM-Reranker-V1 achieves strong reranking performance with superior efficiency. On BEIR, KaLM-Reranker-V1 achieves state-of-the-art performance, on par with strong industrial models such as the Qwen3-Reranker series; on MIRACL, despite not being extensively trained on multilingual data, KaLM-Reranker-V1 still shows excellent reranking performance. Moreover, on LMEB, reranking models demonstrate a clear advantage, with even the 0.27B Nano model remaining competitive with 7-12B embedding models.