Task-Adaptive Embedding Refinement via Test-time LLM Guidance
2026-05-12 • Computation and Language
Computation and LanguageInformation RetrievalMachine Learning
AI summaryⓘ
The authors investigate a way to improve search and classification tasks by using a large language model (LLM) to help refine search queries before creating their embeddings. Their method uses feedback from the LLM on selected documents to adjust the query representation in real time, making the embeddings better suited for specific tasks. They tested this approach on various tough benchmarks and found consistent improvements in search accuracy and classification performance. This technique helps embedding models work better without needing expensive LLM processing on large datasets. The authors provide their code to support further research.
large language model (LLM)embedding modelsquery refinementzero-shot learningsearch rankingclassificationgenerative feedbackembedding spacereal-time adaptationbenchmark evaluation
Authors
Ariel Gera, Shir Ashury-Tahan, Gal Bloch, Ohad Eytan, Assaf Toledo
Abstract
We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of documents, enabling embeddings to adapt in real time to the target task. We conduct extensive experiments with state-of-the-art text embedding models across a diverse set of challenging search and classification benchmarks. Empirical results indicate that LLM-guided query refinement yields consistent gains across all models and datasets, with relative improvements of up to +25% in literature search, intent detection, key-point matching, and nuanced query-instruction following. The refined queries improve ranking quality and induce clearer binary separation across the corpus, enabling the embedding space to better reflect the nuanced, task-specific constraints of each ad-hoc user query. Importantly, this expands the range of practical settings in which embedding models can be effectively deployed, making them a compelling alternative when costly LLM pipelines are not viable at corpus-scale. We release our experimental code for reproducibility, at https://github.com/IBM/task-aware-embedding-refinement.