VeriGraph: Towards Verifiable Data-Analytic Agents

2026-06-15Computation and Language

Computation and LanguageArtificial Intelligence
AI summary

The authors created VeriGraph, a system that helps large language model agents organize their reasoning steps clearly when analyzing data. Instead of mixing calculations and language claims in one text, VeriGraph builds a graph that links raw data, computations, and conclusions visibly. This makes it easier to check where answers come from and judge their correctness. The authors show that VeriGraph performs well on tests and produces explanations that can be verified more reliably than previous methods.

Large Language ModelsNeuro-symbolic reasoningDirected Acyclic Graph (DAG)TraceabilityEvidence graphComputational integritySemantic groundingPolicy optimizationClaim-level evidenceData analytics
Authors
Jiajie Jin, Zhao Yang, Wenle Liao, Yuyang Hu, Guanting Dong, Xiaoxi Li, Yutao Zhu, Zhicheng Dou
Abstract
LLM-based agents have demonstrated strong capabilities in data-intensive analytical tasks, yet their outputs are rarely verifiable: a reliance on linear text trajectories makes their reasoning difficult to audit. In particular, deterministic computations over raw data and semantic deductions over natural-language claims are often entangled in an unstructured stream, leaving numerical conclusions hard to reproduce and qualitative judgments hard to inspect. To address this, we propose VeriGraph, a traceable neuro-symbolic reasoning framework that enables agents to construct an explicit heterogeneous evidence directed acyclic graph (DAG) during execution. VeriGraph introduces three evidence-expansion primitives, namely computational, grounding, and derivational expansion, to connect raw data, interpreter variables, computed results, and natural-language claims in a unified graph. Under this formulation, structural traceability is reduced to graph reachability from raw data sources to terminal claims, while semantic support is measured by claim-level evidence evaluation. To improve graph construction, we further design a graph-based policy optimization strategy with a composite reward that jointly supervises answer correctness, computational integrity, and derivational coherence. Experiments on four benchmarks show that VeriGraph-8B achieves the highest overall score among all baselines. More importantly, VeriGraph produces auditable evidence graphs with substantially stronger claim grounding, achieving a 87.61\% Grounding Rate under our claim-level evidence support evaluation. These results suggest that explicit evidence-graph construction is a promising path toward verifiable data-analytic agents. Our code is available at https://github.com/ignorejjj/VeriGraph.