URecJPQ: Memory-efficient Multimodal Recommendation Models through RecJPQ in Large-Scale Scenarios
2026-06-22 • Information Retrieval
Information Retrieval
AI summaryⓘ
The authors address the problem of large memory use when training recommendation systems that handle many users and items, especially when including different types of item information (multimodal features). They propose URecJPQ, a method that breaks down user and item embeddings into shared smaller parts instead of unique large embeddings, which greatly reduces memory needs. Tests on datasets from movies and product domains show their method drastically lowers the number of parameters and storage size with only a small drop in accuracy, and sometimes even improves performance. This makes it easier to train large recommendation models when resources are limited.
recommendation systemsID embeddingsmultimodal featuresproduct quantizationmemory efficiencytop-k recommendationembedding compressiontrainable parametersrecallNDCG
Authors
Giuseppe Spillo, Zixuan Yi, Aleksandr Petrov, Cataldo Musto, Craig Macdonald, Iadh Ounis
Abstract
Training state-of-the-art recommendation models on large-scale industrial datasets can be a challenging task due to the high number of users and items which are typically represented through ID embeddings. Such embeddings typically require a large amount of memory resources, which are not always available. This problem is further exacerbated in multimodal recommendation, in which multimodal item features generally improve recommendation performance, but require more resources to encode. In this paper, we introduce URecJPQ, a Joint Product Quantization method specifically designed for large-scale and multimodal top-k recommendation tasks, in which the vast number of users and items, combined with the available modalities, further increases the memory demands for the computation. The core idea is to represent each user/item not as a fully learned, unique embedding, but rather as a concatenation of shared learned sub-embeddings, thereby significantly reducing the total number of trainable parameters. Our experiments on three widely-used datasets across different domains (movies, baby and sports products) show that URecJPQ can be effectively applied to multimodal recommendation settings. In large scale scenarios, we observe a substantial reduction in checkpoint sizes and the number of trainable parameters (ranging from 86% to 98%, and 98% to 99%, respectively), with only a marginal decrease in accuracy (8.5% on recall and 16% on NDCG, on average), and, in some cases, even performance improvements (up to 85%), as in the baby products domain. Our codebase is available at https://anonymous.4open.science/r/large_mmrecjpq-839B/README.md.