Explanation-Aware Learning for Enhanced Interpretability in Biomedical Imaging
2026-05-11 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors study how to teach deep learning models for medical images to focus on important, meaningful areas rather than irrelevant parts when making predictions. Instead of just showing explanations after training, they add explanation guidance during training to steer the model's attention. They introduce new ways to measure how well the model's explanations match real clinical features and find that there is a balance between model accuracy and explanation quality. Their experiments on chest X-rays show that explanation supervision can improve interpretability without hurting accuracy, and they provide advice on how to use it effectively despite noisy annotations.
deep neural networksmedical image diagnosissaliency mapsexplanation supervisiontraining objectiveannotation coveragesaliency precisionchest X-rayinterpretabilityexplanation loss
Authors
Zubair Faruqui, Rahul Dubey
Abstract
Deep neural networks for medical image diagnosis often achieve high predictive accuracy while relying on spurious or clinically irrelevant visual cues, limiting their trustworthiness in practice. Post-hoc explanation methods are widely used to visualize model decisions in the form of saliency maps; however, these explanations do not influence how models learn during training, allowing non-causal or confounding features to persist. This motivates the incorporation of explanation supervision directly into the training objective to guide model attention toward clinically meaningful regions and promote clinically grounded decision-making. This paper presents a systematic approach to integrate explanation loss into model training and analyzes how different explanation loss designs and supervision strengths influence both predictive performance and spatial faithfulness of explanations. To quantitatively assess interpretability, two complementary explanation performance metrics-annotation coverage and saliency precision-are introduced, enabling rigorous evaluation beyond qualitative visualization. Our experimental results reveal a clear trade-off between explanation quality and explanation loss coefficients. Furthermore, quantitative statistical analysis yields consistently improved explanation alignment while maintaining comparable accuracy. Experiments were conducted on annotated chest X-ray datasets; however, the proposed framework is applicable to a broad range of annotated biomedical imaging modalities. Overall, these findings demonstrate that explanation supervision is not a monolithic design choice and provide practical guidance for incorporating explanation loss into training objectives under noisy clinical annotations.