PaCX-MAE: Physiology-Augmented Chest X-Ray Masked Autoencoder
2026-06-01 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionMachine Learning
AI summaryⓘ
The authors developed a method called PaCX-MAE that helps chest X-ray computer models learn by using related heart test data like ECG and lab results during training. This makes the X-ray models better at understanding medical details linked to physiology, even though they only look at X-rays when making predictions. Their approach improved performance on several tests, especially those needing physiological info, and worked well even with very little labeled data. They showed that their method focuses on important parts of the X-ray, like the heart's shape, that regular training misses.
Chest X-ray (CXR)Electrocardiogram (ECG)Masked AutoencodingCross-modal DistillationPhysiological PriorsMedical Image SegmentationContrastive LearningZero-shot LearningAUROCF1 Score
Authors
Yancheng Liu, Kenichi Maeda, Manan Pancholy
Abstract
Clinical diagnosis often requires combining imaging with physiological measurements, yet deployed models typically operate on unimodal data. We present PaCX-MAE, a cross-modal distillation framework that injects physiological priors into chest X-ray (CXR) encoders while remaining strictly unimodal at inference. PaCX-MAE augments in-domain masked autoencoding with a dual contrastive-predictive objective, aligning CXR representations with paired ECG and laboratory embeddings. Extensive evaluation across nine benchmarks demonstrates consistent improvements over domain-specific MAE, particularly on physiology-dependent tasks (e.g., +2.7 AUROC on MedMod; +6.5 F1 on VinDr). The method proves highly label-efficient in the 1% regime and preserves anatomical fidelity, achieving parity with MAE on segmentation tasks. Zero-shot and attention analyses confirm that PaCX-MAE successfully learns to attend to physiological indicators, such as the cardiac silhouette, absent in standard visual pretraining.