From Item-Only to Query-Item: Query-Conditioned Generative Search with QGS in Quark
2026-05-25 • Information Retrieval
Information Retrieval
AI summaryⓘ
The authors present a new method called Query-Conditioned Generative Search (QGS) to improve search ranking by handling each user query separately, avoiding confusion from mixing different queries together. They introduce a more efficient way to process long user histories that reduces computation while keeping accuracy. To better use traditional features like text matching, they propose HFG-Attention which combines varied data types effectively. Their approach was tested in a large commercial search engine and showed small but meaningful improvements in user engagement metrics.
Generative sequence modelsSearch rankingQuery conditioningAttention mechanismLinear recurrenceHeterogeneous featuresClick-through rateUser interaction historyA/B testingDeep learning
Authors
Yanglong Song, Zihao Yang, Shuo Meng, Rujun Guo, Jin Zhang, Bin Wang, Shaoyu Liu, Xiaozhao Wang, Guanjun Jiang
Abstract
Generative sequence models have shown strong results in recommendation. Applying them to search ranking is more challenging. Search behavior is inherently query-driven. Each query switch introduces a sharp topic shift in the user's interaction history. Existing generative methods flatten queries and items into a single token sequence. They do not distinguish query boundaries. This causes the model to mix different query intents into one prediction target, resulting in noisy supervision. We present Query-Conditioned Generative Search (QGS). QGS encodes each interaction as a (query, item) pair token. It trains with a query-conditioned next-item objective. The prediction target changes from a noisy marginal P(item_{t+1}|context_{<=t}) to a clean conditional P(item_{t+1}|context_{<=t}, query_{t+1}). This directly removes the semantic discontinuity caused by query switches. Encoding long interaction histories with standard attention has quadratic cost. This is impractical under strict online latency budgets. We introduce a Linear HSTU encoder. It replaces full attention with causal linear recurrence. Per-layer complexity drops from O(L^2) to O(L) with no loss in ranking quality. Traditional search ranking depends on hand-crafted features like text-matching scores, statistical signals, and behavioral features. We propose HFG-Attention to preserve them in the generative framework. It organizes heterogeneous features into semantic groups and fuses them through a dedicated attention block. This bridges sparse engineered signals with dense sequential representations. QGS is deployed in the ranking module of Quark Search, a major commercial search engine in China. Online A/B tests show statistically significant gains: +0.62% CTR, +0.38% Click-Search Ratio, and +3.55% PV Duration over the production deep learning baseline.