Calibrated Sampling-Free Uncertainty Estimation in Bayesian Deep Learning

2026-06-15Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors address the problem that deep learning models often give overly confident predictions, which is risky in important tasks. They focus on Bayesian methods that estimate uncertainty by considering many possible model versions but note these are too slow at test time. To solve this, the authors propose Calibrated Variance Propagation (CVP), a faster way to estimate uncertainty in just one pass through the model, including improvements to handle complex layers like normalization and activation functions. Their method provides uncertainty estimates almost as good as slow traditional methods, but much faster, and works well on advanced models like transformers and convolutional networks.

Bayesian methodsuncertainty estimationvariance propagationdeep learningtransformersnormalization layersactivation functionsresidual calibrationconvolutional neural networks
Authors
Tobias Jan Wieczorek, Leon de Andrade, Thomas Möllenhoff, Marcus Rohrbach
Abstract
Modern deep learning models remain notoriously prone to overconfidence, limiting their reliability in high-stakes applications. Bayesian methods aim to counter this by learning a distribution over model parameters, and recent advances now make this feasible for large-scale architectures at costs comparable to AdamW. However, a challenge remains at test time: predictions must be averaged across many forward passes with weights sampled from the posterior, which is prohibitively expensive. Variance propagation offers an efficient alternative, computing layer-wise analytical approximations of uncertainty in a single forward pass. While such techniques are effective for MLPs, their extension to modern architectures remains challenging, due to increased depth and diversity of layer types. To fill this gap, we propose Calibrated Variance Propagation (CVP), which introduces a new propagation method for normalization layers, combines it with recent techniques for handling activation functions, and absorbs residual error through a light calibration step. CVP yields comparably accurate uncertainty estimates to MC sampling across transformers and CNNs, at a fraction of the cost. Against prior variance propagation work, CVP improves coverage at $0.5\%$ risk from $8.2\%$ to $14.6\%$ with BEiT-3 on Visual Reasoning (NLVR2) and from $2.6\%$ to $10.8\%$ with ViLT on VQAv2, with gains extending to convolutional architectures.