CS-PQ: Cache-Friendly SIMD Product Quantization for Large-Scale ANNS Index Construction

2026-05-25 • Databases

Databases

AI summaryⓘ

The authors focus on making a process called Product Quantization (PQ), which helps with fast searching in large sets of data, run much faster on regular computer processors (CPUs). They point out that previous methods didn't work well due to how data is handled and computed. Their new approach, CS-PQ, changes how calculations are done to better use the CPU's capabilities and memory, speeding things up without reducing accuracy. Tests show their method can be over ten times faster than existing CPU-based solutions.

Product QuantizationApproximate Nearest Neighbor SearchSIMDcache localityvectorizationCPU optimizationquantization granularitydata transfer overheadexecution pipeline

Authors

Y. T. Ma, K. C. Huang, X. K. Jiang, M. L. Wang, X. Yao, R. H. Chen, G. Zhang, Z. L. Shao

Abstract

Product Quantization (PQ) construction is deeply integrated into vector index construction for Approximate Nearest Neighbor Search (ANNS). The rapid growth in vector dimensionality and volume has significantly increased the computational cost of PQ. Existing GPU-based PQ accelerations are ill-suited for PQ construction due to its "one-to-one" execution pattern (one compute, one data load, i.e., data transfer overhead dominates). Although CPU-based solutions are prevalent, they are essentially general-purpose designs that fail to capture the intrinsic characteristics of PQ construction.In this paper, we propose CS-PQ, a Cache-friendly, SIMD-optimized PQ framework based on modern CPUs. CS-PQ introduces a vector-oriented SIMD paradigm that decouples quantization granularity from SIMD width by vectorizing across PQ centroids rather than subvector dimensions. It further restructures the execution pipeline to improve cache locality and reformulates PQ computation to eliminate redundant operations while preserving correctness. Experiments on large-scale datasets show that CS-PQ achieves up to 10.7 times speedup over state-of-the-art CPU-based PQ implementations without sacrificing ANNS accuracy.

View PDFOpen arXiv