Restoring Initial Noise Sensitivity in Text-to-Image Distillation via Geometric Alignment
2026-06-01 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors show that speeding up text-to-image generation by teaching smaller models to mimic bigger ones can lose an important feature called sensitivity to initial noise. This sensitivity is needed for certain control methods that tweak images based on noise patterns. They found that usual ways of training these smaller models flatten this sensitivity, so they propose a new method called Geometry-Aware Distillation that helps the smaller models copy how the bigger ones react to small changes. Their experiments show this approach improves how well the small models keep sensitivity and create diverse images without sacrificing quality.
text-to-image generationgenerative distillationsensitivity to noiseJacobian-vector productstudent-teacher modellocal geometrynoise-based optimizationmodel distillationtrajectory compression
Authors
Huayang Huang, Ruoyu Wang, Jinhui Zhao, Wei Deng, Daiguo Zhou, Jian Luan, Yu Wu, Ye Zhu
Abstract
Generative distillation significantly accelerates text-to-image (T2I) generation by compressing multi-step trajectories into few-step student models while preserving perceptual quality. However, existing methods primarily optimize efficiency and output fidelity, often neglecting critical properties of the original trajectory. In this work, we identify a key missing property: sensitivity to initial noise, whose degradation impairs downstream control methods relying on noise-based optimization and manipulation. We trace this issue to standard distillation objectives that enforce pointwise output alignment, inadvertently flattening the input-output landscape and suppressing the teacher's local geometric structure. To address this, we propose Geometry-Aware Distillation (GAD), a sensitivity-preserving framework that aligns the local functional behavior of teacher and student models. Specifically, GAD matches Jacobian-vector products with respect to input noise, enabling the student to reproduce the teacher's differential response to perturbations. Extensive experiments across multiple T2I paradigms and noise-driven control tasks demonstrate that GAD significantly restores sensitivity and improves diversity while maintaining high visual fidelity. Code is available at https://github.com/Hannah1102/GAD.