From Point Estimates to Distributions: GMM Pooling for MIL in Preterm Birth Prediction

2026-06-22Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning
AI summary

The authors worked on predicting preterm birth using multiple ultrasound images taken from each patient rather than just one. They treated the prediction problem as a multiple instance learning task, where each patient is represented by a varying number of images grouped together. Their new method uses a Gaussian Mixture Model to combine information from all images in a way that keeps track of differences between them. This approach improved prediction accuracy compared to earlier methods and also performed well on another medical imaging task related to lymph nodes.

Preterm BirthTransvaginal UltrasoundMultiple Instance LearningGaussian Mixture ModelPoolingFeature DistributionPR-AUCROC-AUCF1-scoreMedical Image Analysis
Authors
Hussain Alasmawi, Numan Saeed, Soha Said, Mohammad Yaqub
Abstract
Preterm birth (PTB) prediction can enable targeted surveillance and timely intervention, yet most ultrasound-based models use a single selected transvaginal ultrasound (TVUS) frame per patient despite routine exams acquiring multiple cervical images. We formulate PTB prediction as a multiple instance learning (MIL) problem, representing each patient as a variable-sized bag of TVUS images with a single outcome label. To move beyond standard MIL aggregators that collapse a bag into a point estimate, we propose a Gaussian Mixture Model (GMM) pooling, which summarizes all images in a bag into a fixed-length representation by modeling their feature distribution. This design captures intra-patient variability. We evaluate the method on a private clinical cohort and on a public lymph node metastasis benchmark. For PTB prediction, GMM pooling improves over the instance-based model PR-AUC from 0.44 to 0.56. On the lymph node benchmark, it achieves state-of-the-art performance with 0.91 F1-score and 0.89 ROC-AUC for classification and 0.18 MAE for regression. The code is publicly available at https://github.com/HussainAlasmawi/GMM_Pooling.