Beyond Query Memorization: Large Language Model Routing with Query Decomposition and Historical Matching

2026-05-25Artificial Intelligence

Artificial Intelligence
AI summary

The authors address how large language models (LLMs) decide which model to use for a given query, aiming to balance accuracy and cost. They point out that existing methods often memorize superficial query features, which makes them perform poorly on unfamiliar data. To fix this, the authors propose DecoR, which matches new queries to similar past queries based on core task needs rather than just surface words. They also created a new testing set called CodaSet to better evaluate routing strategies. Their experiments show DecoR works well on both familiar and new types of data while using less computing power.

Large Language ModelsRouting MethodsOut-of-Distribution DataQuery MatchingMemorization TrapInference CostPredictive PerformanceBenchmark DatasetCapability DeconstructionCodaSet
Authors
Bo Lv, Jingbo Sun
Abstract
Optimizing the trade-off among predictive performance and computational cost is a central focus in the deployment of Large Language Models (LLMs). Current routing methods primarily rely on direct mapping from queries to models based on surface-level features, making them susceptible to the memorization trap and leading to poor generalizability on out-of-distribution (OOD) data. In this paper, we propose DecoR, a novel routing framework that recasts the routing task as a matching process of sifting similar queries from historical logs, effectively mitigating the memorization trap. To enhance matching accuracy, we introduce a query capability deconstruction method that decouples linguistic surface forms from task-intrinsic requirements, directing matching toward capability dimensions to ground decisions in essential task attributes. Furthermore, we develop CodaSet, a comprehensive benchmark for assessing routing generalization, where experimental results demonstrate that DecoR maintains superior accuracy while substantially lowering inference costs across both in-distribution and OOD settings. All the codes and data are available at https://github.com/lvbotenbest/DecoR.