Retrieval as Reasoning: Self-Evolving Agent-Native Retrieval via LLM-Wiki

2026-05-25Computation and Language

Computation and Language
AI summary

The authors propose LLM-Wiki, a new way for language models to retrieve and use information more like reasoning rather than just copying facts. Instead of treating knowledge as separate pieces, LLM-Wiki organizes it like a connected Wiki with links and allows the model to search, read, and follow links as a tool. It also includes a system to fix mistakes over time. This approach improves performance on several question-answering tests compared to previous methods, especially for multi-step and multi-document questions.

Retrieval-Augmented Generation (RAG)Language Model AgentsMulti-hop ReasoningKnowledge GraphTool-Use in AIHotpotQAWiki StructureError CorrectionMulti-document QABenchmark Evaluation
Authors
Haoliang Ming, Feifei Li, Xiaoqing Wu, Wenhui Que
Abstract
LLM agents require retrieval to behave less like one-shot context fetching and more like reasoning: searching, reading, traversing, and deciding when evidence is sufficient. However, Retrieval-Augmented Generation (RAG) typically organizes external knowledge as flat chunks retrieved by embedding similarity, exposing a retrieval-as-lookup interface that is poorly aligned with tool-using agents. We propose LLM-Wiki, an agent-native retrieval system that operationalizes the Retrieval-as-Reasoning paradigm by treating external knowledge as a compilable, composable, and self-evolving structure rather than a static retrieval index. LLM-Wiki compiles documents into structured Wiki pages with bidirectional links, exposes search, read, and link-following operations through standard tool-calling interfaces, and introduces an Error Book for persistent structural and semantic self-correction. On HotpotQA, MuSiQue, and 2WikiMultiHopQA, LLM-Wiki outperforms seven baselines, including HippoRAG 2, LightRAG, and GraphRAG, with gains of 2.0-8.1 F1 points over the strongest graph-based baseline and larger gains over Dense RAG. On AuthTrace, LLM-Wiki achieves the best overall accuracy, with especially strong gains on multi-document structured queries, showing that compilation-based knowledge organization generalizes beyond chain-style multi-hop reasoning.