Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration

2026-05-08 • Computation and Language

Computation and Language

AI summaryⓘ

The authors address the problem of making sure answers from Knowledge Graph Question Answering (KGQA) systems are reliable and not too broad. They propose a new method called Conformal Path Reasoning (CPR) that improves the way the system calibrates and scores potential answers. CPR uses a special network named Residual Conformal Value Network (RCVNet) to better judge the quality of answers, resulting in more accurate and smaller answer sets. Their experiments show that CPR provides stronger coverage guarantees, meaning the system is more confident in its answers while keeping the number of possible answers lower.

Knowledge Graph Question AnsweringConformal PredictionCalibrationCoverage GuaranteePrediction SetsNonconformity ScoresPUCT-guided ExplorationResidual Conformal Value NetworkEmpirical Coverage RatePath-level Scoring

Authors

Shuhang Lin, Chuhao Zhou, Xiao Lin, Zihan Dong, Kuan Lu, Zhencan Peng, Jie Yin, Dimitris N. Metaxas

Abstract

Knowledge Graph Question Answering (KGQA) has shown promise for grounded and interpretable reasoning, yet existing approaches often fail to provide reliable coverage guarantees over retrieved answers. While Conformal Prediction (CP) offers a principled framework for producing prediction sets with statistical guarantees, prior methods suffer from critical limitations in both calibration validity and score discriminability, resulting in violated coverage guarantees and excessively large prediction sets. To address these pitfalls, we propose Conformal Path Reasoning (CPR), a trustworthy KGQA framework with two key innovations. First, we perform query-level conformal calibration over path-level scores, preserving the exchangeability while generating path prediction sets. Second, we introduce the Residual Conformal Value Network (RCVNet), a lightweight module trained via PUCT-guided exploration to learn discriminative path-level nonconformity scores. Experiments on benchmarks show that CPR significantly improves the Empirical Coverage Rate by 34% while reducing average prediction set size by 40% compared to conformal baselines. These results validate the efficacy of CPR in satisfying coverage guarantees with substantially more compact answer sets.

View PDFOpen arXiv