Entropy Minimization without Model Collapse: Mitigating Prediction Bias in Medical Imaging

2026-06-01Machine Learning

Machine LearningComputer Vision and Pattern Recognition
AI summary

The authors study a problem that happens when a model is adjusted during testing to new data, called model collapse, where it starts making the same wrong predictions for everything. They find that shifts in data cause features for different classes to mix up, creating a bias in predictions, which entropy minimization makes worse by reinforcing these mistakes. To fix this, the authors propose a new method called Distribution Shift Bias Reduction (DSBR) that balances how much each class affects the adjustment process, preventing collapse. They test their method on medical images and ImageNet-C and show it improves stability without needing extra training.

Entropy MinimizationTest-Time AdaptationModel CollapseFeature ClustersPrediction BiasDistribution ShiftDecision BoundaryUnsupervised LearningMedical Imaging DatasetsImageNet-C
Authors
Tim Nielen, Sameer Ambekar, Johannes Kiechle, Daniel M. Lang, Julia A. Schnabel
Abstract
Entropy minimization (EM) is the dominant objective for test-time adaptation, yet its failure mode, model collapse, remains poorly understood. In this work, we show that distribution shifts can cause feature clusters corresponding to distinct classes in the model's representation space to merge, while the decision boundary remains fixed. This induces a systematic skew in the predicted class distribution, referred to as prediction bias. Prediction bias refers to a shift in the predicted class distribution, with some classes overrepresented and others suppressed. We show that entropy minimization amplifies this prediction bias by tightening the existing clusters, reinforcing the incorrect groupings until all predictions collapse to a trivial solution. Next, to demonstrate the significance of prediction bias and mitigate it, we further propose Distribution Shift Bias Reduction (DSBR), a bias-correcting objective that specifically targets this failure mode by equalizing the contribution of each predicted class to the unsupervised entropy minimization loss. To study this failure mode, we design suitable adaptation settings using four medical-imaging datasets and additionally evaluate on ImageNet-C. We find that DSBR consistently stabilizes test-time adaptation, prevents model collapse, and matches or outperforms state-of-the-art methods. Moreover, DSBR operates solely at test-time.