When Confidence Takes the Wrong Path: Diagnosing Retrieval-State Lock-In in RAG

2026-06-22Computation and Language

Computation and LanguageArtificial Intelligence
AI summary

The authors explain that in retrieval-augmented generation (RAG) systems, models can give confident but wrong answers because they get stuck relying on the same flawed data retrieval, a problem they call "retrieval-state lock-in." They show that simply seeing agreement among repeated answers is not enough to trust the results, since the model may repeatedly agree on an incorrect response. By breaking down confidence into parts—the answer, the evidence retrieved, and the retrieval process itself—they create a method to better detect risky answers. Their checks greatly improve precision but reduce how many answers can be confidently accepted. This means trust in such systems depends on carefully checking where errors might come from, not just if answers match.

Retrieval-Augmented Generation (RAG)Uncertainty EstimationRetrieval-State Lock-InAnswer AgreementKnowledge-Graph RAGDense RetrievalConfidence CalibrationEvidence CheckingParametric MemoryQuestion Answering
Authors
Sahib Julka
Abstract
The trustworthiness of a retrieval-augmented generation (RAG) system depends on more than the answer it returns, yet many black-box uncertainty methods still read agreement among sampled answers as confidence. That inference fails when repeated samples condition on the same defective retrieval state. The state may be empty, with the model falling back on parametric memory, or populated by a coherent but wrong neighbourhood. In either case, the answers agree because the error is stable. The problem is recognised in deployed RAG, but it has lacked a name, a measurable signature, and a prevalence bound. We supply all three. We name the failure retrieval-state lock-in and diagnose it by separating the three objects a single confidence score conflates: the answer surface, the retrieved evidence, and the retrieval state itself. In an inspectable, ontology-guided knowledge-graph RAG (KG-RAG) system across six question-answering snapshots, we measure the agreement blind spot directly: at five samples per question, 42% of KG-RAG errors and 59% of dense-retrieval errors carry zero answer dispersion, so agreement has nothing to rank, while evidence- and retrieval-state checks still flag most of them. The decomposition supports an auditable decision rule: accepting an answer only when answer, evidence, and retrieval checks all agree that it is low-risk reaches 91.9% pooled precision against a 69.7% accept-all rate. The cost is coverage: it certifies only 7.7% of answers as low-risk. On the clinical calibration domain it reaches 100% precision under an automated judge; this is an in-domain automated-label upper bound, not a clinical safety claim, and still needs human validation. Confidence in RAG is object-specific: when answers agree, the useful question is which part of the pipeline to distrust.