InverFill: One-Step Inversion for Enhanced Few-Step Diffusion Inpainting

2026-03-24Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors found that quick image inpainting methods often create mismatches between the fixed parts of an image and the filled-in areas because they start from random noise. They propose InverFill, a technique that uses information from the original image to start the inpainting process, making the results look better even with very few steps. This method works without needing extra training on inpainting and still matches the quality of specialized models. Their tests show that InverFill improves both image quality and how well the text descriptions match the images.

diffusion modelsimage inpaintingsampling stepssemantic alignmentGaussian noisetext-to-image modelsinversion methodfew-step generationimage fidelityblended sampling
Authors
Duc Vu, Kien Nguyen, Trong-Tung Nguyen, Ngan Nguyen, Phong Nguyen, Khoi Nguyen, Cuong Pham, Anh Tran
Abstract
Recent diffusion-based models achieve photorealism in image inpainting but require many sampling steps, limiting practical use. Few-step text-to-image models offer faster generation, but naively applying them to inpainting yields poor harmonization and artifacts between the background and inpainted region. We trace this cause to random Gaussian noise initialization, which under low function evaluations causes semantic misalignment and reduced fidelity. To overcome this, we propose InverFill, a one-step inversion method tailored for inpainting that injects semantic information from the input masked image into the initial noise, enabling high-fidelity few-step inpainting. Instead of training inpainting models, InverFill leverages few-step text-to-image models in a blended sampling pipeline with semantically aligned noise as input, significantly improving vanilla blended sampling and even matching specialized inpainting models at low NFEs. Moreover, InverFill does not require real-image supervision and only adds minimal inference overhead. Extensive experiments show that InverFill consistently boosts baseline few-step models, improving image quality and text coherence without costly retraining or heavy iterative optimization.