Relevance Is Not Permission: Warranted Attention for Value Contributions

2026-06-29 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors explain that just because a model pays attention to certain information doesn't mean that information actually helps make correct predictions. They introduce a method called Warrant that controls which parts of the attended information are allowed to influence the prediction by adding a learned permission factor. This approach improves prediction accuracy across several tasks by ensuring only truly helpful information affects the output. Their results show that being selective about what contributes to the prediction path is important and differs from just measuring attention weights.

attention mechanismweighted value termquery-item permissionlink predictiontemporal point processretrieval-augmented generationknowledge graphprediction pathevidence selection

Authors

Minwoo Yu, Young-guk Ha

Abstract

Relevance is not permission. Attention lets a model read key-value items related to the current query, but it does not guarantee that the value contribution of such an item becomes prediction evidence. A retrieved passage may be relevant to a question without being supporting evidence, and a historical fact or temporal neighbor may even blur true-tail ranking or the current edge score. This paper formalizes this gap as a permission problem for the weighted value term alpha_ij * v_j that is actually added to the prediction path. We propose Warrant, a path-localized interface that preserves attention relevance alpha_ij, exposes the value path leading to the primary metric, and, in the full model, turns alpha_ij * v_j into alpha_ij * g_ij * v_j through learned query-item permission g_ij. We place the same operator on the metric-defining value paths of CTDG link prediction, MTPP next-mark ranking, RAG supporting evidence selection, STPP next-location forecasting, and TKG tail prediction. Across 32 paired comparisons, 3 seeds, and 192 total runs, Warrant improves the primary metric in 27 comparisons; practical tiers consist of 10 substantial effects, 1 marginal effect, 8 positive but uncertain effects, 8 tie/negligible effects, and 5 drops. In the path-localization check, correct-path placement outperforms direction-aware Base performance in every domain and exceeds generic attention placement by +0.1076 AUC in CTDG and +0.0683 MRR in TKG. Ablations show that most TKG gains come from historical-tail value path exposure, whereas the core CTDG gain comes from edge-conditioned query-item permission. In conclusion, prediction evidence is not attention mass. A weighted value term becomes evidence only when it is warranted on the path to the metric.

View PDFOpen arXiv