Unlocking In-Context Learning in Audio-Language Models from Decentralized Medical Audio
2026-06-22 • Machine Learning
Machine LearningSound
AI summaryⓘ
The authors developed a new method called Federated Self-Contextualization (FSC) to help diagnose medical conditions from clinical sounds, like breathing or heartbeats, using very few labeled examples. Their approach groups similar audio sounds without needing many labeled recordings and uses a language model to make diagnoses by comparing known examples with new sounds. They trained the model across multiple hospitals without sharing sensitive data and tested it on respiratory and heart conditions, achieving better accuracy than previous methods. This shows their method can work well in places with limited medical data.
Federated learningClinical audio diagnosisSelf-contextualizationMultimodal language modelsUnsupervised clusteringIn-context learningSupport-query pairsEpisodic trainingRespiratory conditionsCardiac conditions
Authors
Ran Piao, Tsai-Ning Wang, Martijn den Dekker, Linda Moonen, Hareld Kemps, Yuan Lu, Aaqib Saeed
Abstract
Clinical audio diagnosis in low-resource settings requires models that identify conditions from minimal examples without large annotated corpora. We propose Federated Self-Contextualization (FSC), a multimodal language model framework for in-context clinical audio diagnosis across federated hospital clients. FSC constructs pseudo-label episodes via unsupervised clustering of audio representations, bypassing scarce real diagnostic labels, and enables contextual reasoning from support-query pairs. Our progressive three-stage pipeline first aligns audio embeddings with the language model via caption-based pretraining, then adapts it for episodic in-context inference through federated optimization. At test time, given a small labeled support set, the model diagnoses an unseen query through multimodal reasoning. On held-out respiratory and cardiac conditions, FSC achieves 71.6% accuracy in 2-way 2-shot evaluation, outperforming audio-language baselines by over 9%.