A multi-architecture study of specificity refinement and false-positive mechanism analysis in prostate MRI
2026-06-29 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionMachine Learning
AI summaryⓘ
The authors studied why MRI scans sometimes wrongly identify non-cancerous areas as prostate cancer (false positives). They found these false positives look very similar to real cancer in imaging features across different models, meaning it's a property of the images, not the computer methods used. They tested a small additional tool to help reduce these errors and saw some improvement in correctly identifying cancer without losing detection ability, but results varied depending on the dataset split. Overall, false positives are linked to image characteristics, and the extra tool can help but with inconsistent effectiveness.
prostate MRIfalse positivespost-hoc refinementcase-level specificityT2-weighted imagingapparent diffusion coefficientnnU-Netcross-validationcontrast ratioMcNemar test
Authors
Yongbo Shu, Kewen Chen, Yifeng Yuan, Zirui Xin, Luo Lei, Yang Yang, Xi Chen, Aijing Luo
Abstract
Objectives: To characterize residual false positives in prostate MRI detection, and to evaluate a lightweight post-hoc refinement head for case-level specificity. Materials and Methods: This retrospective study used PI-CAI (5-fold cross-validation) and Prostate158 (n=158; external). A context-aware evidence head and an 89,216-parameter refinement head were trained on a frozen detection backbone; the evidence head was also trained on four further backbones (bare nnU-Net, bare U-Net, bare Mamba, MIGF-Mamba). For each false-positive region, T2-weighted, apparent-diffusion-coefficient, and high-b-value contrast ratios versus peri-lesional rings were compared against ground-truth lesions and contralateral benign regions. Results: False positives were closer to true cancers than to benign tissue in evidence and raw T2-weighted and apparent-diffusion-coefficient contrast, reproducing 35/35 across five architectures (Cohen's d 1.10; FP/benign evidence ratio 2.38x) and 105/105 across modality-perturbation scenarios. On PI-CAI fold-0, refinement raised case-level specificity from 0.469 to 0.549 (+17.2%) at preserved sensitivity (0.943); 5-fold cross-validation showed fold-conditional behavior (9/15 observations positive; range -22% to +28%). On Prostate158, both models saturated (McNemar pooled p=0.69), while the false-positive contrast-matching finding replicated. Conclusion: Residual false positives are contrast-matched to cancer (sharing raw imaging features rather than histologically confirmed mimicry), reproducing across five architectures -- a data-level imaging property, not model-specific artifacts; post-hoc refinement adds practical specificity in-domain but is fold-conditional.