PathRouter: Aligning Rewards with Retrieval Quality in Agentic Graph Retrieval-Augmented Generation

2026-06-15Computation and Language

Computation and Language
AI summary

The authors study a method called Agentic GraphRAG that helps language models find answers by searching through networks of information step-by-step. They identify two problems: sometimes the model finds shortcuts that give correct answers but ignore important evidence, and it’s unclear which steps in the search to improve from feedback. To fix this, they propose PathRouter, a system that judges search paths based on both answer accuracy and how well the path matches useful evidence. Their tests show PathRouter helps models give better answers and find more relevant evidence than before.

Agentic GraphRAGreinforcement learninggraph-structured evidencereward aliasingsearch-update ambiguitytrajectory-level feedbackKL guidancequestion answeringF1 scoreevidence-path overlap
Authors
Bo Wang, Heyan Huang, Yaolin Li, Wei Tang, Yuan Zhang, Wenbo Li, Mingze Gao, Ge Shi, Chong Feng
Abstract
Agentic GraphRAG trains language-model agents to iteratively retrieve and reason over graph-structured evidence, enabling more accurate and context-aware decision-making by efficiently navigating complex information networks. However, outcome-only reinforcement learning suffers from \textit{\textbf{answer-path reward aliasing}}, where correct answers may come from shortcuts rather than useful evidence paths. It also exhibits \textit{\textbf{search-update ambiguity}}, as scalar trajectory-level feedback does not indicate which retrieval actions to adjust. To mitigate these shortcomings, we present PathRouter, a path-aware training framework for agentic GraphRAG. PathRouter jointly evaluates each trajectory along answer correctness and evidence-path overlap, yielding four trajectory categories with differentiated GRPO advantage scaling that suppresses shortcut reinforcement while preserving evidence-seeking behavior. For evidence-poor trajectories, a frozen gold-evidence teacher provides token-level KL guidance on reasoning and search-query tokens, excluding answer tokens to avoid direct response imitation. Experiments on six QA benchmarks across three model sizes show that PathRouter consistently improves answer F1 and evidence-path overlap, achieving average F1 gains of 3.1 on 3B and 4.9 on 7B models compared to a strong baseline.