Efficient Dataset Selection for Continual Adaptation of Generative Recommenders

2026-04-09Information Retrieval

Information RetrievalMachine Learning
AI summary

The authors looked into how recommendation systems can keep working well when user interests change over time, without having to retrain the whole system all the time, which is too slow for big data. They tested different ways to pick out small but important pieces of user data that help the system adapt. They found that using a special method based on gradients and making sure the selected data matches the overall trends helps the system learn efficiently and stay accurate. This approach can make recommendation systems better at handling changes while saving computing resources.

recommendation systemsdata driftstreaming datadata selectiongradient-based representationdistribution matchingmodel retrainingscalabilityadaptive learninguser interaction data
Authors
Cathy Jiao, Juan Elenter, Praveen Ravichandran, Bernd Huber, Joseph Cauteruccio, Todd Wasson, Timothy Heath, Chenyan Xiong, Mounia Lalmas, Paul Bennett
Abstract
Recommendation systems must continuously adapt to evolving user behavior, yet the volume of data generated in large-scale streaming environments makes frequent full retraining impractical. This work investigates how targeted data selection can mitigate performance degradation caused by temporal distributional drift while maintaining scalability. We evaluate a range of representation choices and sampling strategies for curating small but informative subsets of user interaction data. Our results demonstrate that gradient-based representations, coupled with distribution-matching, improve downstream model performance, achieving training efficiency gains while preserving robustness to drift. These findings highlight data curation as a practical mechanism for scalable monitoring and adaptive model updates in production-scale recommendation systems.