Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs

2026-05-08Computation and Language

Computation and Language
AI summary

The authors created a tool called CMR-EXTR to turn free-text heart MRI reports into organized data that computers can easily understand. Their system also gives a confidence score for each piece of data to help check quality. They designed it to work offline and reduce the need for lots of manual work. Tests showed it is very accurate and provides useful confidence information, making it the first tool focused on this type of heart report with such features.

Cardiac Magnetic Resonance (CMR)Natural Language ProcessingData ExtractionConfidence EstimationTeacher-Student DistillationStructured DataClinical Decision SupportUncertainty QuantificationOffline Inference
Authors
Yi Yu, Parker Martin, Zhenyu Bu, Yixuan Liu, Yi-Yu Zheng, Orlando Simonetti, Yuchi Han, Yuan Xue
Abstract
Converting free-text cardiac magnetic resonance (CMR) reports into auditable structured data remains a bottleneck for cohort assembly, longitudinal curation, and clinical decision support. We present CMR-EXTR, a lightweight framework that converts free-text CMR reports into structured data and assigns per-field confidence for quality control. A teacher-student distillation pipeline enables fully offline inference while limiting manual annotation. Uncertainty integrates three complementary principles -- distribution plausibility, sampling stability, and cross-field consistency -- to triage human review. Experiments show that CMR-EXTR achieves 99.65% variable-level accuracy, demonstrating both reliable extraction and informative confidence scores. To our knowledge, this is the first CMR-specific extraction system with integrated confidence estimation. The code is available at https://github.com/yuyi1005/CMR-EXTR.