Unified Approach for Weakly Supervised Multicalibration
2026-05-11 • Machine Learning
Machine Learning
AI summaryⓘ
The authors address a problem where machine learning models need to give accurate uncertainty estimates for many different groups but don't have access to perfectly labeled data. They develop new ways to measure and fix calibration errors when clean labels are missing, which often happens in weakly supervised learning scenarios. Their approach uses a combination of mathematical tools to correct these errors and guarantees performance even with limited data. They also introduce a new method called WLMC to improve model calibration after training and test it across various situations where labels are weak or incomplete.
MulticalibrationWeakly Supervised LearningPositive-Unlabeled LearningCalibration ErrorContamination MatrixPost-hoc RecalibrationUncertainty EstimationFinite-sample GuaranteesLabel Noise
Authors
Futoshi Futami, Takashi Ishida
Abstract
Multicalibration requires predicted scores to agree with label probabilities across rich families of subgroups and score-dependent tests, but existing methods require clean input-label pairs for evaluation and post-processing. This assumption fails in weakly supervised learning (WSL) regimes -- including positive-unlabeled, unlabeled-unlabeled, and positive-confidence learning -- where clean labels are costly or unavailable even though reliable uncertainty estimates may be crucial. We address this gap by developing estimators of multicalibration error and post-hoc correction methods for WSL settings in which clean input-label pairs are unavailable. We propose a unified framework for estimating and correcting multicalibration under weak supervision by combining contamination-matrix risk rewrites with witness-based calibration constraints, yielding corrected multicalibration moments with finite-sample guarantees. We further propose weak-label multicalibration boost (WLMC), a generic post-hoc recalibration algorithm under weak supervision. Finally, we conduct experiments across multiple weak-supervision settings to evaluate multicalibration behavior and offer empirical insight into uncertainty estimation under weak supervision.