When Fine-Tuning Changes the Evidence: Architecture-Dependent Semantic Drift in Chest X-Ray Explanations

2026-04-09Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors studied how medical image classifiers change the visual clues they rely on after being fine-tuned, even when their accuracy stays the same. They call this change 'semantic drift,' meaning the model's reasoning shifts even if predictions don’t. Using chest X-rays and three different neural networks, they found that general anatomical focus stays stable, but detailed attribution maps reorganize differently depending on the model and method used. They also showed that explanations for the same performance level can vary, highlighting that explanation stability depends on model type, training phase, and the explanation method.

transfer learningfine-tuningsemantic driftattribution mapschest X-ray classificationDenseNet201ResNet50V2InceptionV3LayerCAMGradCAM++
Authors
Kabilan Elangovan, Daniel Ting
Abstract
Transfer learning followed by fine-tuning is widely adopted in medical image classification due to consistent gains in diagnostic performance. However, in multi-class settings with overlapping visual features, improvements in accuracy do not guarantee stability of the visual evidence used to support predictions. We define semantic drift as systematic changes in the attribution structure supporting a model's predictions between transfer learning and full fine-tuning, reflecting potential shifts in underlying visual reasoning despite stable classification performance. Using a five-class chest X-ray task, we evaluate DenseNet201, ResNet50V2, and InceptionV3 under a two-stage training protocol and quantify drift with reference-free metrics capturing spatial localization and structural consistency of attribution maps. Across architectures, coarse anatomical localization remains stable, while overlap IoU reveals pronounced architecture-dependent reorganization of evidential structure. Beyond single-method analysis, stability rankings can reverse across LayerCAM and GradCAM++ under converged predictive performance, establishing explanation stability as an interaction between architecture, optimization phase, and attribution objective.