Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net

2026-04-13Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors worked on improving how computer programs check the accuracy of medical images used in radiotherapy, specifically for complex treatments targeting bones and lymph nodes. They developed a method that not only predicts the treatment areas automatically but also shows uncertainty maps to highlight where mistakes might happen, helping doctors focus their review. Different techniques to improve the reliability of these uncertainty maps were tested, and combining calibration with certain ensemble methods gave the best results. This approach could make the review process more efficient and safer in clinical settings.

Clinical Target VolumeRadiotherapyTotal Marrow and Lymph Node IrradiationAuto-segmentationUncertainty QuantificationCalibrationEnsemble MethodsnnU-NetPredictive EntropyQuality Assurance
Authors
Ricardo Coimbra Brioso, Lorenzo Mondo, Damiano Dei, Nicola Lambri, Pietro Mancosu, Marta Scorsetti, Daniele Loiacono
Abstract
Accurate delineation of the Clinical Target Volume (CTV) is essential for radiotherapy planning, yet remains time-consuming and difficult to assess, especially for complex treatments such as Total Marrow and Lymph Node Irradiation (TMLI). While deep learning-based auto-segmentation can reduce workload, safe clinical deployment requires reliable cues indicating where models may be wrong. In this work, we propose a budget-aware uncertainty-driven quality assurance (QA) framework built on nnU-Net, combining uncertainty quantification and post-hoc calibration to produce voxel-wise uncertainty maps (based on predictive entropy) that can guide targeted manual review. We compare temperature scaling (TS), deep ensembles (DE), checkpoint ensembles (CE), and test-time augmentation (TTA), evaluated both individually and in combination on TMLI as a representative use case. Reliability is assessed through ROI-masked calibration metrics and uncertainty--error alignment under realistic revision constraints, summarized as AUC over the top 0-5% most uncertain voxels. Across configurations, segmentation accuracy remains stable, whereas TS substantially improves calibration. Uncertainty-error alignment improves most with calibrated checkpoint-based inference, leading to uncertainty maps that highlight more consistently regions requiring manual edits. Overall, integrating calibration with efficient ensembling seems a promising strategy to implement a budget-aware QA workflow for radiotherapy segmentation.