Decoupling Vector Data and Index Storage for Space Efficiency
2026-04-10 • Databases
DatabasesOperating Systems
AI summaryⓘ
The authors looked at how current systems store both large vectors and their search helpers together, which causes big storage and speed problems. They created DecoupleVS, a system that separates the actual vector data from the helper information to handle each more efficiently. This separation allows better compression and faster searches and updates. Testing on huge datasets showed DecoupleVS uses much less storage while keeping or improving search speed and accuracy compared to current systems.
approximate nearest neighbor searchvector dataindex metadatastorage compressiondata layoutupdate performancesearch accuracydisk-based systemswrite amplification
Authors
Yuanming Ren, Juncheng Zhang, Yanjing Ren, Rui Yang, Di Wu, Patrick P. C. Lee
Abstract
Managing large-scale vector datasets with disk-based approximate nearest neighbor search (ANNS) systems faces critical efficiency challenges stemming from the co-location of vector data and auxiliary index metadata. Our analysis of state-of-the-art ANNS systems reveals that such co-location incurs substantial storage overhead, generates excessive reads during search queries, and causes severe write amplification during updates. We present DecoupleVS, a decoupled vector storage management framework that enables specialized optimizations for vector data and auxiliary index metadata. DecoupleVS incorporates various design techniques for effective compression, data layouts, search queries, and updates, so as to significantly reduce storage space, while maintaining high search and update performance and high search accuracy. Evaluation on real-world public and proprietary billion-scale datasets shows that DecoupleVS reduces storage space by up to 58.7\%, while delivering competitive or improved search query and update performance, compared to state-of-the-art monolithic disk-based ANNS systems.