Rethinking Incompleteness: Formalizing Protocol Divergence and Train-Once Learning for Robust IMVC

2026-06-03Machine Learning

Machine Learning
AI summary

The authors point out a problem in how models are usually trained to handle missing data, showing that just knowing how much data is missing isn’t enough to understand how hard the problem really is. They introduce a new way to measure the differences in missing data setups and prove that when there aren’t enough fully complete examples, learning becomes very difficult. To fix this, they design a new model called CRAFT that uses a special attention-based method to work well with any missing data pattern without needing to be retrained. Their experiments show CRAFT performs as well or better than other models while being much more efficient to train.

missing datamultiview learningdata incompletenessincompleteness divergencereconstruction objectivesattention mechanismtransformersmaskingrobustnessvariable-length fusion
Authors
Haolu Liu, Xiyue Wang, Xuanting Xie, Liangjian Wen, Zhao Kang
Abstract
Standard IMVC evaluation retrains separate models for different missing-data configurations. We show that this paradigm obscures a fundamental vulnerability: missing rate alone is insufficient to characterize data incompleteness. Specifically, we show that protocols with identical nominal missing rates can differ by up to $50\times$ in their proportion of fully observed samples, inducing drastically different learning regimes. We formalize this phenomenon as incompleteness divergence, providing measures that capture structural disparities across missing-data protocols. We further prove that for a broad class of reconstruction-based objectives, learning becomes structurally ill-posed when the proportion of complete samples falls below a critical threshold, leading to near-random performance. To bypass this theoretical bound, we propose CRAFT (Complete-data Robust Attention-masked Fusion Transformer). CRAFT shifts the burden of robustness from the loss function to the architecture via two key properties: (i) per-sample independence, which removes reliance on complete-sample co-occurrence, and (ii) mask-aware variable-length fusion, which aggregates only observed views through attention masking. This design allows a single model, trained once on complete data, to generalize to diverse missing patterns at inference time without retraining. Extensive experiments on seven benchmarks show that CRAFT matches or outperforms per-configuration baselines while reducing training overhead by $8.8\times$, demonstrating that robustness to missing data can be achieved as an inherent architectural property. Code (CRAFT) and our imvc-audit toolkit are available at https://anonymous.4open.science/r/CRAFT-BF80/ and https://anonymous.4open.science/r/imvc-audit-8263/.