In-Context Multiple Instance Learning

2026-06-04Machine Learning

Machine LearningArtificial IntelligenceComputer Vision and Pattern Recognition
AI summary

The authors study Multiple Instance Learning (MIL), where labels are given for groups of data points instead of individual ones, a common scenario in fields like medical imaging. They found that existing methods either overfit or don't adapt well when there are only a few labeled examples. By training a special type of model called a Perceiver-style in-context learner on fake (synthetic) data, their model can quickly learn new tasks from just a few labeled groups without extra training steps at test time. They also tested various ways to create synthetic data to make the model versatile and showed it performs well across many MIL tasks compared to traditional methods.

Multiple Instance LearningIn-context learningPerceiver architectureSynthetic dataFew-shot learningBag-structured dataSupervised learningOverfittingInductive biasBenchmark datasets
Authors
Alexander Möllers, Marvin Sextro, Julius Hense, Gabriel Dernbach, Klaus-Robert Müller
Abstract
Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.