Robust Trajectory Distillation: Hybrid Reweighting Meets Teacher-Inspired Targets
2026-06-29 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors look at a way to shrink big, messy datasets into smaller, cleaner ones that still work well for training machine learning models. They focus on problems caused by noisy, or incorrect, labels which can confuse the training process. To fix this, they propose a new method that tracks how the training changes over time to find and downweight noisy parts without needing extra clean data or label fixes. Their approach, with two key techniques called SGR and TIAT, helps create smaller datasets that keep important info and are more reliable, especially when labels are messy. This method works well across different types of noise and is easy to use.
Dataset distillationNoisy labelsSelective Guidance ReweightingTeacher-Inspired Auxiliary TargetsModel robustnessLabel noiseData condensationMachine learning trainingNoise suppressionSecond-split forgetting
Authors
Kaifeng Chen, Lechao Cheng, Jiyang Li, Shengeng Tang, Fan Zhang, Yantao Pan, Yaxiong Wang, Tuanrui Hui, Zhun Zhong
Abstract
Dataset distillation (DD) condenses large corpora into compact, information-rich subsets for efficient training and reuse. However, under noisy supervision, DD risks condensing corrupted associations together with useful signals, degrading robustness. Conventional noisy-label remedies (sample selection, loss weighting, label correction) tightly couple noise estimation with model optimization, often require clean anchors, and can amplify confirmation bias-assumptions that are misaligned with DD's goal of compact, plug-and-play supervision. We therefore propose a trajectory-based DD framework that jointly suppresses noise and preserves transferable knowledge without relabeling or clean subsets. It comprises two complementary components: Selective Guidance Reweighting (SGR), which fuses global forgetting patterns (second-split forgetting) with local neighborhood consistency into a progressive reweighting scheme that prioritizes clean supervision along the teacher trajectory; and Teacher-Inspired Auxiliary Targets (TIAT), which inject auxiliary residual guidance distilled from intermediate teacher dynamics to reinforce informative signals while remaining internally consistent. Together, SGR and TIAT produce distilled datasets with cleaner and richer representations under noisy supervision. The framework is robust, label-preserving, computationally lightweight, and broadly applicable, yielding consistent gains over state-of-the-art DD baselines across symmetric, asymmetric, and real-world noise.