NumColBERT: Non-Intrusive Numeracy Injection for Late-Interaction Retrieval Models
2026-05-11 • Information Retrieval
Information Retrieval
AI summaryⓘ
The authors worked on improving search systems that need to understand numbers in questions, like finding companies that spend more than a billion dollars on research. They noticed current methods either ignore numbers or handle them separately, which causes problems for real-world use. Their solution, NumColBERT, enhances number understanding directly within an existing search model without changing how it works overall. They added a special mechanism to focus on important numbers and a training method to help the system grasp numeric details better. Tests showed their approach works well, matching or beating previous methods, while staying easy to deploy and maintain.
Dense retrievalNumerical conditionsLate-interaction modelColBERTNumerical Gating MechanismContrastive learningEmbedding spaceMaxSim scoringInformation retrievalNumerical reasoning
Authors
Haruki Fujimaki, Makoto P. Kato
Abstract
This study addresses the challenge of improving dense retrieval performance for queries containing numerical conditions, such as ``companies with more than one billion dollars in R&D expenditure.'' Although recent research has shown that standard models struggle with numeric information in domains such as finance, e-commerce, and medicine, existing solutions typically decompose queries into textual and numerical components and score them separately. These approaches modify late-interaction retrieval models such as ColBERT and introduce challenges in deployment, latency, and maintainability. To overcome these limitations, we propose NumColBERT, an inference-time non-intrusive method that enhances numerically conditioned retrieval while preserving the original late-interaction mechanism. Because NumColBERT retains the standard ColBERT indexing and MaxSim scoring pipeline, existing optimizations and ecosystem components can be reused directly, facilitating practical deployment. NumColBERT introduces a Numerical Gating Mechanism and a Numerical Contrastive Learning objective to enable numerical conditions to contribute more effectively within standard ColBERT scoring. The gating mechanism amplifies tokens carrying critical numerical constraints while suppressing context-neutral numerical mentions, and the contrastive objective shapes the embedding space to reflect numerical magnitudes, units, and conditions. Experimental results show that NumColBERT substantially outperforms standard fine-tuning baselines and achieves accuracy comparable to or better than prior approaches relying on separate textual and numerical scoring. These findings demonstrate the feasibility of numerically conditioned retrieval with a non-intrusive inference pipeline and present a maintainable solution for real-world deployment.