Uncertainty Estimation in Pathology Foundation Models via Deep Mutual Learning

2026-06-29 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors address a problem with pathology foundation models (PFMs), which are tools used to analyze medical images but often lack trustworthy confidence estimates in their predictions. They introduce DICE, a method that combines multiple pre-trained PFMs and measures how much the models disagree to estimate uncertainty. By training the models to learn together deeply, the authors show that this disagreement reliably signals when the model might be wrong. Their approach also helps locate abnormalities in images without needing extra labeled data. Tests on challenging datasets show DICE improves trustworthiness in medical image analysis without losing accuracy.

Pathology Foundation ModelsWhole-Slide Image AnalysisUncertainty EstimationModel EnsembleDeep Mutual LearningOut-of-Distribution DetectionMedical Image LocalizationCalibrationFailure Detection

Authors

Gbègninougbo Aurel Davy Tchokponhoue, Sevda Öğüt, Ali Idri, Dorina Thanou, Pascal Frossard

Abstract

Pathology foundation models (PFMs) offer generalizable representations for whole-slide image (WSI) analysis, yet their clinical adoption remains limited. Specifically, their predictions lack reliable confidence estimates, and no single PFM is universally best across tasks, which severely undermines trust in medical settings. To overcome this, we propose $\mathtt{DICE}$, a plug-and-play framework that ensembles $K$ frozen PFMs and models their disagreement as a proxy for uncertainty estimation. To ensure this proxy yields meaningful estimates, we align the ensemble members via deep mutual learning, and theoretically show that this objective upper-bounds the model uncertainty. Additionally, we demonstrate that the ensemble's consensus localizes abnormalities at the patch level without any explicit supervision. We evaluate $\mathtt{DICE}$ on three challenging WSI benchmarks. Notably, our framework provides reliable uncertainty estimates that accurately flag failure-prone cases under in- and out-of-distribution settings, while matching or outperforming SOTA baselines in classification, calibration, and localization. Overall, $\mathtt{DICE}$ takes a crucial step toward translating PFMs into uncertainty-aware decision-support systems.

View PDFOpen arXiv