Anatomy-Anchored Self-Supervision: Distilling Vision Foundation Models for Invariant Ultrasound Representation

2026-05-25Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors developed a new method called ANAUS to improve how computers learn from ultrasound images by focusing on important body parts rather than just random image areas. They created a tool that can automatically identify anatomical regions without needing manual labels, which helps the computer understand the images better. Their approach teaches the model to recognize the same structures from different views and to fill in missing parts, making the learning more accurate. Tests on several public ultrasound datasets showed their method works better than existing ones while still being fast enough for practical use.

self-supervised learningultrasound imaginganatomical structuresrepresentation learninglatent promptdomain adaptationfeature invarianceimage segmentationcontextual prediction
Authors
Chunzheng Zhu, Yijun Wang, Jianxin Lin, Feng Wang, Hongwei Wang, Lei Zhao, Shengli Li, Kenli Li
Abstract
Self-supervised pre-training paradigm has gained increasing prominence for learning transferable representations in medical imaging, yet existing methods for ultrasound (US) images operate at the image or frame level, overlooking the anatomical context for clinical-aligned representation learning. In this work, we propose an anatomy-anchored ultrasound self-supervision framework ANAUS that shifts representation learning from generic visual regions to clinically meaningful anatomical structures. Utilizing a learnable latent prompt engine alongside a one-time domain adaptation on existing public image--mask pairs, we empower the LP-SAM module to achieve annotation-free anatomy delineation at scale. Building upon this anatomical grounding, we propose a dual-policy self-supervised learning paradigm consisting of inter-view semantics-aware anatomy-separating alignment and contextual core-region prediction to enhance representation learning. Specifically, the former enforces feature invariance within identical anatomical regions while promoting discriminability across distinct structures; the latter compels the model to reconstruct corrupted regions, thereby capturing fine-grained structural details. Extensive evaluations on six public datasets demonstrate that \ours{} consistently outstrips current state-of-the-art methods while maintaining the computational efficiency essential for clinical deployment. Code is available at https://github.com/zhcz328/ANAUS.