RS-Gen: A Multi-Stage Agentic Framework for Reasoning and Search-Augmented Image Generation

2026-06-22Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors present RS-Gen, a new system that helps image generation and editing models better understand tricky or unclear instructions by asking itself questions and solving problems step-by-step. Unlike older models that struggle with logic and unusual information, RS-Gen uses a loop to identify and fix knowledge gaps in real time without extra training. Tests show that RS-Gen improves existing image models significantly, making them some of the best open-source options available. This approach helps machines think more carefully when creating or editing images.

Image generationImage editingLogical reasoningOut-of-Distribution (OOD) knowledgeUnified understanding-and-generation modelsAgentic frameworkClosed-loop mechanismBenchmark testingOpen-source modelsRS-Gen
Authors
Feifei Bian, Zhimin Zheng, Wei Deng, Daiguo Zhou, Jian Luan
Abstract
Recent years have witnessed remarkable progress in image generation and editing, particularly regarding instruction following and visual fidelity. However, when handling ambiguous intentions, logical reasoning, and Out-of-Distribution (OOD) knowledge, existing image models often yield sub-optimal results due to a lack of deep reasoning capabilities and real-time external information. Although emerging unified understanding-and-generation models attempt to bridge this gap, they remain constrained by their intrinsic parameter scales and static knowledge gaps. Inspired by agentic paradigms, we propose RS-Gen: a plug-and-play, training-free, multi-stage image agentic framework. RS-Gen innovatively introduces a "Questioning-and-Solving" closed-loop mechanism to accurately identify logical issues and knowledge gaps, autonomously planning actions to bridge information deficits and execute deep logical reasoning. Extensive experiments demonstrate that RS-Gen significantly expands the capability boundaries of foundational image generation and editing models. Specifically, on the WISE Verified and RISEBench benchmarks, RS-Gen yields substantial absolute performance gains of 0.313 for Qwen-Image and 19.70 for Qwen-Image-Edit-2511, respectively, successfully elevating both to the state-of-the-art (SOTA) level among open-source models.