Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision

2026-04-10Computation and Language

Computation and LanguageArtificial IntelligenceInformation RetrievalMachine Learning
AI summary

The authors study how models can better decide if given evidence really supports a specific claim, especially in complex areas like radiology. They created a method where the model sees a specific case, some evidence, and a claim, then decides if the evidence backs the claim for that case. To teach the model, they generated both supportive and tricky non-supportive examples without needing manual labeling. Their model clearly relied on the evidence to make decisions, and this approach worked well even when tested with new cases and evidence sources. The authors conclude that the main challenge is not just building smart models but creating supervision that truly shows the role of evidence in reasoning.

evidence-grounded reasoningsupervisionradiologyevidence verificationcounterfactual examplessupport examplesmodel evaluationcausal role of evidencestructured claimevidence dependence
Authors
Soroosh Tayebi Arasteh, Mehdi Joodaki, Mahshad Lotfinia, Sven Nebelung, Daniel Truhn
Abstract
Evidence-grounded reasoning requires more than attaching retrieved text to a prediction: a model should make decisions that depend on whether the provided evidence supports the target claim. In practice, this often fails because supervision is weak, evidence is only loosely tied to the claim, and evaluation does not test evidence dependence directly. We introduce case-grounded evidence verification, a general framework in which a model receives a local case context, external evidence, and a structured claim, and must decide whether the evidence supports the claim for that case. Our key contribution is a supervision construction procedure that generates explicit support examples together with semantically controlled non-support examples, including counterfactual wrong-state and topic-related negatives, without manual evidence annotation. We instantiate the framework in radiology and train a standard verifier on the resulting support task. The learned verifier substantially outperforms both case-only and evidence-only baselines, remains strong under correct evidence, and collapses when evidence is removed or swapped, indicating genuine evidence dependence. This behavior transfers across unseen evidence articles and an external case distribution, though performance degrades under evidence-source shift and remains sensitive to backbone choice. Overall, the results suggest that a major bottleneck in evidence grounding is not only model capacity, but the lack of supervision that encodes the causal role of evidence.