LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

2026-05-11Information Retrieval

Information Retrieval
AI summary

The authors address the problem of making large language models (LLMs) more efficient for recommendation systems by using latent reasoning, which works with hidden states instead of generating text step-by-step. They identify challenges like the lack of meaning in Semantic ID symbols, drifting representations, and fixed reasoning lengths. Their solution, called LASAR, uses a two-stage training process and reinforcement learning to align and control reasoning steps dynamically. LASAR improves recommendation quality while greatly reducing the computation time compared to traditional chain-of-thought methods. Tests on real datasets show it outperforms existing approaches with minimal extra delay.

Large Language ModelsChain-of-ThoughtLatent ReasoningSemantic IDReinforcement LearningKL DivergencePolicy GradientRecommendation SystemsRepresentation DriftReasoning Depth
Authors
Yiwen Chen, Fuwei Zhang, Zehao Chen, Deqing Wang, Hehan Li, Peizhi Xu, Hanmeng Liu, Shuanglong Li, Xin Pei, Fuzhen Zhuang, Zhao Zhang
Abstract
Large Language Models (LLMs) have demonstrated powerful reasoning capabilities through Chain-of-Thought (CoT) in various tasks, yet the inefficiency of token-by-token generation hinders real-world deployment in latency-sensitive recommender systems. Latent reasoning has emerged as an effective paradigm in LLMs, performing multi-step inference in a continuous hidden-state space to achieve stronger reasoning at lower cost. However, this paradigm remains underexplored in mainstream generative recommendation. Adapting it reveals three unique challenges: (1) the gap between prior-less Semantic ID (SID) symbols and continuous latent reasoning - SIDs lack pre-trained semantics, hindering joint optimization; (2) representation drift due to a lack of reasoning chain supervision; and (3) the suboptimality of applying a globally fixed reasoning depth. To address these, we propose LASAR (Latent Adaptive Semantic Aligned Reasoning), an SFT-then-RL framework. First, we bridge this gap via two-stage training: Stage 1 grounds SID semantics before Stage 2 introduces latent reasoning, ensuring efficient convergence. Second, we mitigate representation drift through explicit CoT semantic alignment. Step-wise bidirectional KL divergence constrains the latent reasoning trajectory using hidden-state anchors extracted from CoT text, while a Policy Head predicts per-sample reasoning depth. Third, during the GRPO-based RL phase, terminal-only KL alignment accommodates variable-length reasoning, and REINFORCE optimizes the Policy Head to dynamically allocate steps. This nearly halves the average latent step count while simultaneously improving recommendation quality. Experiments on three real-world datasets demonstrate that LASAR outperforms all baselines. It adds marginal inference latency and is roughly 20 times faster than generating explicit CoT text.