EviRank: Evidence-Based Confidence Estimation for LLM-Based Ranking

2026-06-03 • Information Retrieval

Information Retrieval

AI summaryⓘ

The authors address problems with how large language models (LLMs) estimate confidence when recommending items, noting that current methods either give one overall score or low, unhelpful scores for each ranked item. They propose EviRank, a new way to measure confidence by combining three types of evidence from a single model run and adjusting for the position of items in the ranking. This calibrated confidence then helps improve how the model ranks recommendations. Their tests on three datasets show that EviRank works better than existing methods for both making recommendations and measuring uncertainty.

Large Language ModelsRecommendation SystemsUncertainty QuantificationConfidence EstimationRankingPosition-aware CalibrationModel InternalsEvidence Aggregation

Authors

Meng Yan, Cai Xv, Xujing Wang, Ziyu Guan, Wei Zhao

Abstract

Large Language Models show promise for recommendation, but they raise reliability concerns due to limited domain coverage and inherent stochasticity. Existing uncertainty quantification methods persist two fundamental challenges: (1) the global confidence score designed for question answering fails to reveal which positions are unreliable in ranking list; (2) fine-grained confidence extracted from model internals exhibits uniformly low values across all positions, making it impossible to filter unreliable predictions. To tackle the challenges, we propose an evidence-based confidence estimation for LLM-based ranking (EviRank). We extract three complementary evidences from a single forward pass and aggregate them via reliable opinion aggregation. Furthermore, we recognize that ranking positions are inherently unequal, and introduce a position-aware calibration. Lastly, the calibrated confidence guides ranking optimization. Experiments on three datasets demonstrate that our method achieves state-of-the-art performance on both recommendation and uncertainty quantification.

View PDFOpen arXiv