Evaluating self-supervised echocardiographic representations across downstream extraction strategies for left-ventricular segmentation and ejection fraction estimation

2026-06-22 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors studied how well self-supervised learning (SSL) methods can help analyze heart ultrasound images, especially for tasks like finding the left ventricle and measuring how well it pumps. Instead of using just one way to test the learned image features, they tried several methods with increasing complexity to see how much useful information the features really had. They found that simple tests underestimated the usefulness of the SSL features, and more flexible tests showed the features almost matched the performance of fully supervised models. The authors suggest that using multiple evaluation methods is important when judging SSL quality for detailed medical image tasks.

Self-supervised learningEchocardiographyLeft ventricular segmentationEjection fractionRepresentation learningFrozen linear probeLightweight decoderBootstrap Your Own Segmentation (BYOS)DINOv3Downstream evaluation

Authors

Sylwia Majchrowska, Philip Teare

Abstract

Self-supervised learning (SSL) is increasingly used in medical imaging to reduce annotation requirements, but representation quality is often judged using a single downstream evaluation setting. For dense clinical tasks, this can confound representation quality with the capacity of the downstream model used to recover task-relevant information. We present a systematic evaluation of self-supervised representations for left-ventricular segmentation and ejection fraction (EF) estimation from apical four-chamber echocardiography on EchoNet-Dynamic. Rather than relying on a single downstream probe, we compare a hierarchy of extraction strategies with increasing expressivity: heuristic extraction without mask-supervised training, frozen linear probes, frozen lightweight decoder probes, and partial fine-tuning. We apply this framework to two complementary representation families: generic frozen self-DIstillation with NO labels (DINOv3) features and a task-adapted dense self-supervised representation, Bootstrap Your Own Segmentation (BYOS). In both families, heuristic extraction substantially understated what was recoverable from the frozen representation. For DINOv3, performance improved from Dice 0.684 and EF mean absolute error (MAE) 13.01 under heuristic extraction to Dice 0.906 and EF MAE 9.65 with a frozen lightweight decoder, approaching a supervised U-Net baseline (Dice 0.915, EF MAE 9.72). For BYOS, performance improved from Dice 0.687 and EF MAE 17.83 under heuristic extraction to Dice 0.902 and EF MAE 8.74 with a frozen lightweight decoder. These results show that conclusions about self-supervised representation quality in dense echocardiographic analysis depend strongly on the downstream extraction strategy used for evaluation. We therefore argue that multi-strategy evaluation is an important methodological consideration for SSL in dense medical image analysis.

View PDFOpen arXiv