Bandwidth Selection in Kernel Density Estimation for Model Calibration

2026-06-29Machine Learning

Machine Learning
AI summary

The authors focus on improving how neural networks estimate their uncertainty, which is important for making trustworthy predictions. They look at a method called Kernel Density Estimation (KDE) that helps measure how well these models are calibrated, but note that its success depends on choosing the right bandwidth parameter. To solve this, the authors propose Risk Alignment (RA), a new way to pick this bandwidth by matching estimated risk with actual risk, reducing errors in calibration measurement. Their experiments show that RA works better than common methods across different models and datasets.

deep learninguncertainty estimationmodel calibrationKernel Density Estimation (KDE)bandwidth selectionMaximum Likelihood Estimation (MLE)calibration errorRisk Alignmentempirical riskpredictive accuracy
Authors
Han Zhou, Teodora Popordanoska, Matthew Blaschko
Abstract
As deep learning models are increasingly deployed in high-stakes applications, providing well-calibrated uncertainty estimates has become as critical as achieving high predictive accuracy. While Kernel Density Estimation (KDE) has emerged as a smooth and continuous alternative to traditional binning for quantifying miscalibration, its reliability is heavily dependent on the choice of the kernel bandwidth. Standard selection techniques, such as Maximum Likelihood Estimation (MLE), often fail to produce optimal bandwidths for calibration tasks. In this work, we introduce Risk Alignment (RA), a novel optimization framework that determines the optimal bandwidth by aligning KDE-reconstructed risk with empirical risk. We theoretically demonstrate that this alignment minimizes calibration estimation bias across the data distribution, establishing a principled bandwidth selection criterion applicable to various metrics, including the challenging case of canonical calibration error. Extensive experiments across multiple architectures and datasets show that RA consistently outperforms standard bandwidth selection methods, yielding more reliable calibration assessments.