SPICE: Synergy and Partial Information Based Curriculum Evolution

2026-06-15 • Machine Learning

Machine Learning

AI summaryⓘ

The authors created a new way to teach AI models that use multiple types of data (like images and text) by focusing on how these types of data work together. They break down the information into parts that are shared, unique, or combined between the data types, which helps decide what examples the model should learn from first. This teaching plan changes during training, starting with easy shared info and moving towards more complex interactions. Their tests show this approach helps the models learn better than traditional methods.

multimodal learningcurriculum learningPartial Information Decomposition (PID)synergistic informationredundant informationunique informationsample complexityprogressive curriculumadaptive learningmodel evolution

Authors

Ankush Pratap Singh, Houwei Cao, Yong Liu

Abstract

Multimodal learning exploits complementary information across heterogeneous modalities. The informativeness of each modality can vary widely across samples and training stages. Existing multimodal curriculum learning strategies often assume that the relative complexity of samples remains unchanged throughout training and therefore cannot adapt to model evolution. We propose SPICE (Synergy and Partial Information based Curriculum Evolution), a novel progressive curriculum framework for multimodal interaction learning. Guided by Partial Information Decomposition (PID) theory, our approach decomposes multimodal interactions into redundant, unique, and synergistic information components, enabling an interpretable and dynamic characterization of sample complexity. Building on this decomposition, we design a progressive curriculum that evolves throughout training, allowing the model to transition from learning shared cross-modal cues to modality-specific patterns and, finally, to complex synergistic interactions. Adapting to model evolution, sample ordering is refined in real-time using PID information estimates derived from unimodal and multimodal predictions. Experiments across multiple multimodal benchmarks demonstrate consistent improvements over conventional training and state-of-the-art baselines, highlighting the effectiveness of PID information decomposition and adaptive sample ordering for multimodal curriculum learning.

View PDFOpen arXiv