SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions
2026-03-07 • Artificial Intelligence
Artificial IntelligenceComputation and LanguageCryptography and SecurityInformation Retrieval
AI summaryⓘ
The authors study advanced AI systems that use large language models to autonomously fetch information and solve problems step-by-step. They explain these systems using a decision-making framework called Markov decision processes and organize them by how they plan, retrieve data, remember information, and use tools. The paper points out problems in current testing methods and risks like mistakes accumulating over time or wrong data being used. Finally, the authors suggest important future research areas to make these systems more reliable and easier to control.
Retrieval-Augmented GenerationAgentic systemsMarkov decision processesControl policiesMemory managementEvaluation methodologiesHallucination in AITool invocationDynamic retrievalSequential decision-making
Authors
Saroj Mishra, Suman Niroula, Umesh Yadav, Dilip Thakur, Srijan Gyawali, Shiva Gaire
Abstract
Retrieval-Augmented Generation (RAG) systems are increasingly evolving into agentic architectures where large language models autonomously coordinate multi-step reasoning, dynamic memory management, and iterative retrieval strategies. Despite rapid industrial adoption, current research lacks a systematic understanding of Agentic RAG as a sequential decision-making system, leading to highly fragmented architectures, inconsistent evaluation methodologies, and unresolved reliability risks. This Systematization of Knowledge (SoK) paper provides the first unified framework for understanding these autonomous systems. We formalize agentic retrieval-generation loops as finite-horizon partially observable Markov decision processes, explicitly modeling their control policies and state transitions. Building upon this formalization, we develop a comprehensive taxonomy and modular architectural decomposition that categorizes systems by their planning mechanisms, retrieval orchestration, memory paradigms, and tool-invocation behaviors. We further analyze the critical limitations of traditional static evaluation practices and identify severe systemic risks inherent to autonomous loops, including compounding hallucination propagation, memory poisoning, retrieval misalignment, and cascading tool-execution vulnerabilities. Finally, we outline key doctoral-scale research directions spanning stable adaptive retrieval, cost-aware orchestration, formal trajectory evaluation, and oversight mechanisms, providing a definitive roadmap for building reliable, controllable, and scalable agentic retrieval systems.