Safe Few-Step Generation via Velocity Editing

2026-06-22 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionComputers and Society

AI summaryⓘ

The authors address safety concerns in a new text-to-image generation method called flow matching, which produces images quickly with few steps. Traditional safety methods don’t work well here, so they create VESFlow, a way to steer image creation by directly editing the underlying velocity field to avoid unsafe content without changing the input prompt. They also develop VESFlow+, an improved version that better avoids unsafe outputs by both moving toward safe areas and away from unsafe ones. Their experiments show that VESFlow+ effectively blocks unwanted content while keeping normal image quality intact.

Flow MatchingText-to-Image GenerationSafety FilteringVelocity FieldDenoising StepsConcept RemovalPrompt ConditioningRisk ScoringSampling StepsImage Generation Fidelity

Authors

Yujin Choi, Jaehong Yoon

Abstract

Flow matching has recently emerged as a strong paradigm for state-of-the-art text-to-image (T2I) generation, enabling high-quality generation with a small number of sampling steps. As these models are increasingly integrated into real-world applications, ensuring safe and non-sensitive content generation has become a critical requirement. However, adapting safety and concept removal methods to this new generation framework remains an open challenge. Specifically, prior methods largely rely on iterative trajectory steering across a number of denoising steps or on CLIP-centric prompt embedding manipulation. These design assumptions pose fundamental bottlenecks for safety in flow matching-based T2I generation, where limited sampling steps constrain iterative correction and modern context-aware text encoders diminish the effectiveness of embedding-level interventions. In this paper, we propose VESFlow, a training-free safety method tailored to flow matching with extremely few sampling steps. Leveraging the fact that flow matching models learn the marginal velocity, we directly edit the velocity field via a safe-conditional posterior. VESFlow steers the trajectory toward safe outputs while leaving the conditioning prompt unchanged. Building on the observation that VESFlow leaves outputs unchanged under benign prompts, we further introduce a risk score-based filtering that bypasses velocity editing to reduce computational cost while preserving benign prompt generation. Based on this filtering, we propose VESFlow+, a stronger variant of VESFlow that not only edits the velocity toward the safe direction, but also pushes it away from the unsafe direction. Experimental results show that VESFlow+ removes the target concept, reducing the attack success rate by NudeNet to 6.3% on Ring-A-Bell and 6.8% on MMA-Diffusion on the 4-step MeanFlow model, while preserving fidelity on benign prompts.

View PDFOpen arXiv