Improving Image-to-Image Translation via a Rectified Flow Reformulation

2026-03-20Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors introduce I2I-RFR, a method that improves image-to-image (I2I) models by turning them into continuous-time transport models, helping to refine results progressively. Instead of complex generative models, their method adds noisy target information as extra input channels and trains with a special loss that enables stepwise improvement during inference. This approach keeps training simple while enhancing detail and perceptual quality across various image and video tasks. It generally works by just expanding input channels and using a few steps in the refinement process, making it a lightweight upgrade for existing I2I models.

image-to-image translationrectified flowcontinuous-time transportpixel-wise regressionODE solverimage restorationgenerative modelsperceptual qualitysupervised trainingmultimodal targets
Authors
Satoshi Iizuka, Shun Okamoto, Kazuhiro Fukui
Abstract
In this work, we propose Image-to-Image Rectified Flow Reformulation (I2I-RFR), a practical plug-in reformulation that recasts standard I2I regression networks as continuous-time transport models. While pixel-wise I2I regression is simple, stable, and easy to adapt across tasks, it often over-smooths ill-posed and multimodal targets, whereas generative alternatives often require additional components, task-specific tuning, and more complex training and inference pipelines. Our method augments the backbone input by channel-wise concatenation with a noise-corrupted version of the ground-truth target and optimizes a simple t-reweighted pixel loss. This objective admits a rectified-flow interpretation via an induced velocity field, enabling ODE-based progressive refinement at inference time while largely preserving the standard supervised training pipeline. In most cases, adopting I2I-RFR requires only expanding the input channels, and inference can be performed with a few explicit solver steps (e.g., 3 steps) without distillation. Extensive experiments across multiple image-to-image translation and video restoration tasks show that I2I-RFR generally improves performance across a wide range of tasks and backbones, with particularly clear gains in perceptual quality and detail preservation. Overall, I2I-RFR provides a lightweight way to incorporate continuous-time refinement into conventional I2I models without requiring a heavy generative pipeline.