Long-Term Embeddings for Balanced Personalization
2026-04-09 • Machine Learning
Machine Learning
AI summaryⓘ
The authors identify that current transformer-based recommendation systems focus too much on recent user behavior, ignoring long-term preferences. Instead of using longer sequences that are expensive and less effective, they create Long-Term Embeddings (LTE) to represent stable user interests. They solve a practical problem where feature updates cause inconsistencies between training and deployment by fixing the embeddings to a stable semantic space. Testing on an online shopping platform shows that adding LTE improves user engagement and business outcomes.
transformerrecommendation systemsrecency biaslong-term preferencesembeddingsfeature storeoffline-online mismatchcausal language modelingautoencoderA/B testing
Authors
Andrii Dzhoha, Egor Malykh
Abstract
Modern transformer-based sequential recommenders excel at capturing short-term intent but often suffer from recency bias, overlooking stable long-term preferences. While extending sequence lengths is an intuitive fix, it is computationally inefficient, and recent interactions tend to dominate the model's attention. We propose Long-Term Embeddings (LTE) as a high-inertia contextual anchor to bridge this gap. We address a critical production challenge: the point-in-time consistency problem caused by infrastructure constraints, as feature stores typically host only a single "live" version of features. This leads to an offline-online mismatch during model deployments and rollbacks, as models are forced to process evolved representations they never saw during training. To resolve this, we introduce an LTE framework that constrains embeddings to a fixed semantic basis of content-based item representations, ensuring cross-version compatibility. Furthermore, we investigate integration strategies for causal language modeling, considering the data leakage issue that occurs when the LTE and the transformer's short-term sequence share a temporal horizon. We evaluate two representations: a heuristic average and an asymmetric autoencoder with a fixed decoder grounded in the semantic basis to enable behavioral fine-tuning while maintaining stability. Online A/B tests on Zalando demonstrate that integrating LTE as a contextual prefix token using a lagged window yields significant uplifts in both user engagement and financial metrics.