Rethinking Dataset Distillation for Classification: Do Distilled Sets Outperform Coresets?

2026-06-16Machine Learning

Machine Learning
AI summary

The authors studied methods that create small fake training sets from big datasets to speed up machine learning, called dataset distillation (DD). They tested many recent DD methods using the same rules and compared them with a simple technique that picks real examples called coreset selection (CS). They found that DD doesn't always work better than just using random real samples, and often CS works just as well or better with less effort. The authors also checked how well the small sets represent the original data and saw that coresets cover the data better. Overall, they suggest that current DD methods might not offer clear benefits over simpler, cheaper methods like coresets.

Dataset DistillationCoreset SelectionData CompressionSynthetic SamplesImageNetTraining ProtocolsData RepresentativenessData DiversityMachine Learning Efficiency
Authors
Trisha Mittal, Akshay Mehra, Joshua Kimball
Abstract
Dataset distillation (DD) has emerged as a prominent approach in data centric machine learning, aiming to synthesize compact training sets for efficient training by compressing the information in large datasets into a small number of synthetic samples. However, DD methods are often evaluated under inconsistent evaluation protocols, ranging from standard ERM to single/multi-teacher supervision, making it difficult to isolate the effectiveness of distilled data from evaluation. Moreover, many prior methods claim that DD outperforms data pruning approaches such as coreset selection (CS), based on the assumption that restricting condensed datasets to subsets of real samples fundamentally limits their expressiveness. In this work, we critically evaluate DD methods through large-scale experiments using standardized datasets and evaluation protocols to assess their intrinsic effectiveness. We benchmark seven state-of-the-art (SOTA) DD methods on ImageNet-1K, ImageNet100, and ImageNette, using three widely adopted training protocols against three CS strategies. Our results show that while some DD methods fail to outperform even simple random subsets, the SOTA DD approaches are comparable to or worse than coresets on large-scale datasets and incur a substantially higher cost for construction. Beyond accuracy, we also evaluate the representativeness, diversity, and quality of condensed sets, and find that coresets consistently achieve better coverage of the original data distribution. These findings highlight the limited practical advantages of current DD methods and show that coresets remain competitive and are often a more computationally efficient alternative for data-centric learning.