A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models
2026-03-10 • Information Retrieval
Information Retrieval
AI summaryⓘ
The authors studied how to make a search system called ColBERT use less storage by keeping fewer word representations. Instead of guessing which words to ignore, they used a math idea involving shapes called Voronoi cells to decide which words are most important in the search space. Their method helps the system stay accurate while using less space and also helps understand how individual words affect search results.
ColBERTdense retrievaltoken embeddingindex pruningVoronoi cellembedding spacehyperspace geometryinformation retrieval
Authors
Yash Kankanampati, Yuxuan Zong, Nadi Tomeh, Benjamin Piwowarksi, Joseph Le Roux
Abstract
Late-interaction models like ColBERT offer a competitive performance across various retrieval tasks, but require storing a dense embedding for each document token, leading to a substantial index storage overhead. Past works address this by attempting to prune low-importance token embeddings based on statistical and empirical measures, but they often either lack formal grounding or are ineffective. To address these shortcomings, we introduce a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space. By interpreting each token's influence as a measure of its Voronoi region, our approach enables principled pruning that retains retrieval quality while reducing index size. Through our experiments, we demonstrate that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving and interpreting token-level behavior within dense retrieval systems.