Missing Pattern Recognized Diffusion Imputation Model for Missing Not At Random

2026-05-25 • Machine Learning

Machine Learning

AI summaryⓘ

The authors introduce PRDIM, a new method to fill in missing data when the missingness depends on unknown values themselves, a tricky situation called MNAR. Their method uses a special step to recognize patterns in what data is missing, helping to guess the missing values more accurately. They repeat this process to improve their guesses and tested it on different types of data, finding it works consistently well. This approach helps handle missing data better when the missing parts are not random.

Missing DataImputationMissing Not at Random (MNAR)Diffusion ModelExpectation-Maximization (EM) AlgorithmPattern RecognitionJoint DistributionTime-Series DataImage Data

Authors

Gyuwon Sim, Sumin Lee, Heesun Bae, Byeonghu Na, Doyun Kwon, Ju-Hee Hwang, Jae-Young Lim, Il-Chul Moon

Abstract

Missing data frequently arises across diverse domains, including time-series and image domains. In the real world, missing occurrences often depend on the unobservable values themselves, which are referred to as Missing Not at Random (MNAR). In this work, we introduce the Missing Pattern Recognized Diffusion Imputation Model (PRDIM), a novel framework that explicitly captures the missing pattern and precisely imputes unobserved values. PRDIM iteratively maximizes the likelihood of the joint distribution for observed values and missing mask under an Expectation-Maximization (EM) algorithm. In this sense, we first employ a pattern recognizer, which approximates the underlying missing pattern and provides guidance during every inference toward more plausible imputations with respect to the missing information. Through extensive experiments, we demonstrate that PRDIM consistently achieves strong imputation performance under MNAR settings across multiple data modalities.

View PDFOpen arXiv