SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

2026-05-04 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors focus on improving how computers check each step they take when figuring out answers using knowledge graphs, especially in sensitive areas like medicine and law. They found that existing methods sometimes give good scores to reasoning paths even if some steps are wrong, because later steps fix earlier mistakes. To fix this, they created a new model called SCPRM that looks at each step carefully in context and predicts how close it is to the final goal, helping guide the reasoning more safely. They also combined SCPRM with a search technique to improve question answering performance, showing better and safer reasoning results.

large language modelsknowledge graphsreasoning pathsprocess reward modelsrisk compensation effectschema distanceMonte Carlo Tree Searchmulti-hop reasoningquestion answering

Authors

Jiujiu Chen, Yazheng Liu, Sihong Xie, Hui Xiong

Abstract

Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, assigning high rewards to flawed reasoning paths. This issue is further exacerbated in knowledge graph (KG) reasoning, as there may exist multiple paths between the start and end entities in the KGs, and a risky step can make the reasoning path flawed. Those limitations are problematic in risk-sensitive tasks such as medical and legal KG reasoning. To address the issues, we propose a Schema-aware Cumulative Process Reward Model (SCPRM) that evaluates reasoning paths by conditioning on the reasoning prefix , and incorporating schema distance between current reasoning step and the implicit target parsed from the query, which provides cumulative and future rewards to guide the path explorations. We further integrate SCPRM into Monte Carlo Tree Search (MCTS) as SCPRM-MCTS to conduct multi-hop reasoning on KGs for question answering (QA) tasks. Across medical and legal KGQA and CWQ, SCPRM-MCTS improves the performance of Hits@k by an average of 1.18% over strong baselines, demonstrating more accurate and risk-sensitive reasoning evaluation.

View PDFOpen arXiv