SIREN: Unified Multi-Granularity Semantic Interaction for Multi-Modal Lifelong User Interest Modeling

2026-05-25Information Retrieval

Information Retrieval
AI summary

The authors present SIREN, a method to better understand users' long-term interests by combining different types of data like images and texts together more effectively. Unlike earlier methods that treated each data type separately, SIREN uses a unified approach to mix multi-modal content with user behavior for improved recommendation accuracy. They tested SIREN offline and found it performs better, and real-world tests showed it boosts sales on various Tencent platforms. Since mid-2025, Tencent has fully adopted SIREN in its advertising system.

recommender systemsmulti-modal datalifelong user interest modelingsemantic interactiontransformer architectureretrieval strategiesGAUConline A/B testingTencent advertising platformcollaborative filtering
Authors
Yaqian Zhang, Ruyi Yu, Tianyi Li, Bohan Liu, Maoquan Ye, Ke Wang, Shifeng Wen, Junwei Pan, Lijie Wang, Qi Zhou, Yeshou Cai, Chengguo Yin, Lifeng Wang, Hui Li, Lei Xiao, Haijie Gu
Abstract
Industrial recommender systems increasingly leverage lifelong user behavior histories and rich multi-modal content to capture evolving user preferences. However, effectively integrating multi-modal features into lifelong interest modeling remains challenging due to the inherent misalignment between multi-modal and collaborative spaces. Existing paradigms typically rely on separate modeling of multi-modal sequence and behavior sequence, and late fusion to alleviate the modality gap, which results in coarse-grained multi-modal representation and limited integration. In this paper, we propose SIREN, a unified multi-granularity semantic interaction framework for multi-modal lifelong user interest modeling. In the General Search Unit stage, we introduce two alternative retrieval strategies: multi-modal similarity-based soft retrieval for retrieval effectiveness, and Semantic ID (SemID)-based hard retrieval for efficient industrial serving. For the Exact Search Unit stage, we explicitly incorporate target-aware relevance via coarse similarity buckets and fine-grained prefix-encoded SemIDs, enabling unified interaction with collaborative ID features within the target-conditioned transformer architecture. Extensive experiments on the offline dataset demonstrate that SIREN achieves a state-of-the-art GAUC. Online A/B tests further demonstrate consistent GMV gains across multiple production scenarios, including +2.28% in Weixin Moments, +3.87% in Weixin Official Accounts, and +1.61% in Weixin Channels. From July 2025, SIREN has been fully launched for full-traffic serving in Tencent's advertising platform.