From Reports to Ontologies: Ontology-Guided Representation Learning for 12-Lead ECG

2026-05-25Computational Engineering, Finance, and Science

Computational Engineering, Finance, and Science
AI summary

The authors developed MAR-ECG, a new method to teach computers how to understand ECG signals by using a heart-related medical knowledge graph instead of relying on paired text reports. Their approach uses two tasks: one that helps the model learn connections between related heart conditions based on an ontology, and another that looks at detailed heart signal features across different timescales. They trained MAR-ECG on many ECG recordings and found it performs better than previous models, especially when only a small amount of labeled data is available. Even without text reports, their method matches the accuracy of approaches that use both ECG signals and clinical notes.

12-lead ECGself-supervised learningSNOMED-CTmasked autoregressive modelcontrastive learningphysiological signal processingontology alignmentmultimodal learningclinical diagnostic codes
Authors
Lei Xu, Fahad Sohrab, Mehmet Yamac, Merja Heinaniemi, Moncef Gabbouj
Abstract
The 12-lead electrocardiogram (ECG) is a quasi-periodic, multi-channel signal with diagnostic content spanning timescales from millisecond waveform morphology to multi-second rhythm dynamics. Existing ECG representation learning relies on signal-only self-supervision or ECG-text multimodal alignment, neither of which exploits the structured diagnostic codes attached to every clinical recording. We present \textbf{MAR-ECG}, an ontology-guided masked autoregressive framework that supervises the encoder with a curated 40-node SNOMED-CT cardiac graph through \emph{graph alignment}, eliminating the need for paired clinical reports. MAR-ECG combines two complementary objectives. First, \emph{graph-smoothed contrastive learning} (GSCL) anchors the encoder's rhythm-pooled features to the SNOMED graph, softening supervision targets by ontology distance so that clinically related concepts reinforce one another rather than function as hard negatives. Second, \emph{multi-scale physiological supervision} complements GSCL with signal-derived patch auxiliaries that target rhythm-physiology statistics extracted automatically from the input, extending supervision beyond the patch tier at no annotation cost. Pretrained on ${\sim}40$K publicly available 12-lead ECGs with SNOMED-CT codes and evaluated by frozen linear probing on five downstream classification benchmarks, MAR-ECG consistently outperforms a strong masked-autoregressive baseline, with mean gains in the low-label regime. Despite the absence of paired clinical text, MAR-ECG achieves performance competitive with state-of-the-art multimodal ECG-text methods.