Towards Reliable Fetal Ultrasound Interpretation with Multi-Agent Collaboration

2026-05-25Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMultiagent Systems
AI summary

The authors developed FetUSAgents, a multi-agent system that helps interpret fetal ultrasound images by combining different specialized visual tools and language models. Unlike prior methods that focus on single tasks, their system breaks down clinical questions into smaller steps, from recognizing anatomy to making measurements. They also created a method called Dual-Path Evidence Arbitration to combine reasoning by language models with concrete visual data for better accuracy. To test their approach, they introduced a new dataset called FetUS-VQA and showed their system performs significantly better than existing models. This work aims to improve automated, reliable interpretation of prenatal ultrasound images.

Fetal ultrasoundVisual question answeringLarge language modelsAnatomical segmentationBiometric measurementMultimodal learningEvidence arbitrationMulti-agent systemsClinical report generationDataset benchmark
Authors
Xiaotian Hu, Mingxuan Liu, Junwei Huang, Kasidit Anmahapong, Yifei Chen, Yiming Huang, Xuguang Bai, Zihan Li, Hongjia Yang, Yingqi Hao, Hong Xu, Yu Jiang, Tian Tian, Yi Liao, Haibo Qu, Qiyuan Tian
Abstract
Automated fetal ultrasound interpretation requires a workflow from visual perception, including plane recognition and anatomical segmentation, to clinical understanding, including biometric measurement and diagnostic reporting. However, the prevailing "one-task, one-model" paradigm limits systematic integration of evidence across this multi-step process. Although multimodal large language models (MLLMs) show promising visual understanding, their limited domain-specific grounding and hallucination risks restrict reliability in fetal ultrasound analysis. To address these limitations, we propose FetUSAgents, a tool-augmented multi-agent system for comprehensive fetal ultrasound interpretation, supporting visual question answering (VQA), report generation, image captioning, and video summarization. FetUSAgents coordinates task-specific visual tools through collaborative LLM agents and decomposes clinical queries into subtasks that progress from anatomical recognition to quantitative measurement. We further introduce Dual-Path Evidence Arbitration (DPEA), which integrates LLM-based deliberative reasoning with structured computational evidence from specialized visual tools. A retrieval-enhanced evidence bank consolidates intermediate findings to support traceable and clinically grounded conclusions. In addition, we construct FetUS-VQA, a dedicated VQA benchmark for fetal ultrasound, comprising 1,892 images and 3,205 question-answer pairs across 10 clinical tasks. Extensive out-of-distribution experiments show that FetUSAgents outperforms general and medical MLLMs, exceeding the strongest baseline by more than 25 percent in VQA accuracy. These results suggest a scalable route toward evidence-driven clinical assistants for prenatal imaging. Code is available.