KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking

2026-06-22Computation and Language

Computation and Language
AI summary

The authors introduce KaLM-Reranker-V1, a new method that speeds up search result reranking by separately processing queries and passages but still uses detailed attention to match them well. Their approach balances efficiency and accuracy by encoding passages first and then using a decoder to understand the user's intent and link it to passages. They tested three model sizes and found their method performs very well compared to other leading models, even on challenging multilingual and benchmark datasets. This shows their method can provide strong and fast reranking without needing huge models or extensive multilingual training.

rerankingencoder-decoder architecturecross-attentionquery encodingpassage encodingMatryoshka embedding poolingBEIR datasetMIRACL datasetLMEB datasetembedding models
Authors
Xinping Zhao, Jiaxin Xu, Ziqi Dai, Xin Zhang, Shouzheng Huang, Danyu Tang, Xinshuo Hu, Meishan Zhang, Baotian Hu, Min Zhang
Abstract
As retrieval systems scale, high-quality reranking becomes increasingly important. However, most existing rerankers, whether encoder-based or decoder-based, jointly encode the query and passage, tightly coupling their computation and limiting deployment efficiency as well as flexibility. We present KaLM-Reranker-V1, a fast but not late-interaction (FBNL) reranker that decouples query and passage computation while retaining expressive relevance modeling. Built on an encoder-decoder architecture, KaLM-Reranker-V1 uses the encoder to pre-encode passages with Matryoshka embedding pooling, while the decoder models the system instruction, user instruction, and query intent; cross-attention then captures relevance between the query context and passage representations. This design makes KaLM-Reranker-V1 efficient through decoupled passage encoding, yet not late interaction, by preserving rich relevance modeling through cross-attention. We instantiate KaLM-Reranker-V1 in three sizes, Nano, Small, and Large, with 0.27B, 1B, and 4B activated parameters, respectively. Extensive experiments on BEIR, MIRACL, and LMEB demonstrate that KaLM-Reranker-V1 achieves strong reranking performance with superior efficiency. On BEIR, KaLM-Reranker-V1 achieves state-of-the-art performance, on par with strong industrial models such as the Qwen3-Reranker series; on MIRACL, despite not being extensively trained on multilingual data, KaLM-Reranker-V1 still shows excellent reranking performance. Moreover, on LMEB, reranking models demonstrate a clear advantage, with even the 0.27B Nano model remaining competitive with 7-12B embedding models.