teasr: training-efficient any-step diffusion transformer for real-world image super-resolution

2026-06-15 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors introduce TEASR, a new method to speed up and improve image super-resolution using diffusion models, which are typically slow to run. Unlike previous approaches, their method doesn't need extra teacher models, making it easier and more efficient to train large models on regular hardware. TEASR can generate images in either one step or multiple steps using the same model, allowing users to balance speed and quality. They also design a special transformer structure to better handle noise during image restoration. Tests show TEASR delivers better results than other leading methods on several datasets.

Diffusion modelsImage super-resolutionSelf-adversarial distillationOne-step samplingMulti-step samplingTimestep-aware rectificationDiffusion transformerNoise conditioningGenerative priorsTraining efficiency

Authors

Xiang Gao, Chenxin Zhu, Yushun Fang, Qiang Hu, Xiaoyun Zhang

Abstract

Diffusion models excel in Real-World Image Super-Resolution (Real-ISR) due to their powerful generative priors but suffer from slow iterative sampling. Although existing one-step distillation methods accelerate inference, they typically require auxiliary teacher models that inflate training memory and restrict scalability to large-scale architectures. Furthermore, these fixed-step models lack the flexibility to trade off speed for quality. In this paper, we propose TEASR, a training-efficient any-step diffusion framework for Real-ISR that enables both one-step and multi-step restoration within a unified model. Our key idea is to perform self-adversarial distillation within a single diffusion model, eliminating the need for auxiliary teachers or discriminators. Specifically, we propose a timestep-aware rectification strategy that stabilizes one-step generation across noise levels. These two designs further enables the distillation of 20B-parameter diffusion models on a single GPU, significantly improving training efficiency. Moreover, we introduce a dual-branch diffusion transformer with decoupled timestep condition to separate the current noise state and the denoising target to enhance sampling quality. Extensive experiments demonstrate that TEASR supports seamless any-step sampling and consistently outperforms state-of-the-art methods across multiple datasets.

View PDFOpen arXiv