RL-Index: Reinforcement Learning for Retrieval Index Reasoning

2026-06-15 • Information Retrieval

Information RetrievalArtificial IntelligenceMachine Learning

AI summaryⓘ

The authors created a method called RL-Index to improve how computers find information by thinking ahead during the indexing step, instead of waiting until someone asks a question. They use large language models to add explanations to documents that help connect tricky questions with the right answers. This approach uses reinforcement learning to make these explanations better at helping retrieval. Their experiments show that RL-Index finds information faster and more accurately, and it works well with different search systems.

Information RetrievalReinforcement LearningIndexingLarge Language ModelsQuery RewritingRetrieval SimilarityBRIGHT BenchmarkPolicy OptimizationDocument AugmentationQuestion Answering

Authors

Yongjia Lei, Nedim Lipka, Zhisheng Qi, Utkarsh Sahu, Koustava Goswami, Franck Dernoncourt, Ryan A. Rossi, Yu Wang

Abstract

Retrieving external knowledge is essential for solving real-world tasks, yet it remains challenging when the relationship between a query and its relevant knowledge involves implicit and complex reasoning beyond surface-level semantic or lexical matching (e.g., mathematical problems relying on the same theorem or coding requiring deep reasoning). Existing approaches primarily rely on query-side reasoning (e.g., query rewriting), which introduces significant online latency and underutilizes the opportunity to perform reasoning over the knowledge corpus itself (i.e., index-side reasoning). In this paper, we propose RL-Index, an agentic indexing framework that formulates retrieval index reasoning as a reinforcement learning problem. Instead of performing reasoning at query time, RL-Index shifts reasoning to the indexing stage by augmenting documents with LLM-generated rationales that explicitly encode the latent query-knowledge relationship. To optimize the quality of these rationales, we employ Group Relative Policy Optimization (GRPO) and use retrieval similarity as a verifiable reward signal, enabling direct optimization of indexing decisions for retrieval effectiveness. Extensive experiments on the BRIGHT benchmark demonstrate that RL-Index consistently improves both retrieval and downstream question-answering performance, while significantly reducing online inference latency. Moreover, the learned rationale augmentation generalizes across diverse retrievers and generators, highlighting its robustness as a plug-and-play indexing strategy across different retrieval systems.

View PDFOpen arXiv