TUDSR: Twice Upsampling-Diffusion for Higher Super-Resolution

2026-06-08 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors address the problem that current diffusion-based image upscaling models struggle to create very high-resolution images, especially when the desired enlargement is much bigger than what the model was originally trained for. They propose a new method called TUDSR, which uses a two-step training process involving progressively larger image sizes and a specialized training strategy to handle big images in chunks. Their approach improves the quality of super-resolution images at sizes like 1024x1024 and 2048x2048 without needing extremely large models. Experiments show their method outperforms previous techniques on several benchmarks.

diffusion modelsimage super-resolutiontiled diffusionupsampling ratiogeneratordiscriminatorGAN architecturehigh-resolution image synthesischunk-based trainingSD2.1-base

Authors

Zhiqiang Wu, Yitong Dong, Xian Wei

Abstract

Diffusion-based generative models have achieved remarkable success in real-world image super-resolution (SR). With tiled diffusion techniques, these models can produce high-resolution images that exceed their native-supported resolution. However, the quality of such high-resolution (e.g $2048^2$) outputs often remains extremely poor, primarily due to two factors we consider: the image upsampling ratio (e.g $\times8$) exceeding the model's native-supported upsampling ratio (e.g $\times4$), and the model's native-supported resolution. In practice, training a native high-resolution model requires larger architectures, which incur significant computational overhead and GPU memory costs, making it hard on limited-resource equipment. Thus, we present TUDSR, a Twice Upsampling-Diffusion framework for higher SR. The TUDSR framework mainly consists of two stages: the first involves training at $R$-resolution, and the second introduces a looped chunk-based training strategy at $NR$-resolution. Each stage adapts a one-step GAN architecture comprising a generator and a discriminator. Based on SD2.1-base, we develop TUDSR-S, which achieves state-of-the-art performance across multiple benchmarks. Extensive experiments further demonstrate that TUDSR-S generates high-quality images at the resolutions of $1024^2$ and even $2048^2$, significantly outperforming existing approaches. Code is available at https://github.com/wuer5/TUDSR.

View PDFOpen arXiv