Cross-Modal Iteration Distillation for Robust IHD Screening: The IDNet Framework and A New Benchmark
2026-06-29 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors developed a new method called IDNet to help detect ischemic heart disease using pictures of the inside of the eye (retinal images) combined with some clinical data. They created a special part called the Cross-Modal Distillation Aggregator (CDA) to better mix detailed eye images with simpler clinical information. They also built a large, reliable dataset from the UK Biobank for testing this approach. Their method worked better than using only images or only clinical data, showing that combining both types of data improves detection.
Color Fundus PhotographyIschemic Heart DiseaseMultimodal LearningCross-Modal Distillation AggregatorRetinal ImagingClinical VariablesUK BiobankVisual EncodersData Fusion
Authors
Yongchang Gao, Junjie Pang, Shuaiyu Yang, Yusheng Yang, Xichao Jia, Shaojie Li, Hongfei Zhang, Jia Mu
Abstract
Color Fundus Photography (CFP) offers a low-cost and non-invasive route for ischemic heart disease (IHD) screening, but current studies are limited by scarce public benchmarks and ineffective fusion of retinal images with sparse clinical variables. We propose IDNet, a multimodal framework with a Cross-Modal Distillation Aggregator (CDA) that uses learnable queries to sequentially integrate left-eye, right-eye, and clinical features, mitigating the imbalance between high-dimensional visual features and low-dimensional tabular inputs. We also construct a reproducible UK Biobank benchmark with open-source curation and quality-control pipelines, yielding 50,410 images from 25,205 subjects. On this benchmark, IDNet outperforms image-only, clinical-only, and several multimodal baselines, and CDA consistently improves multiple visual encoders as a plug-in fusion module.