Towards Controllable Image Generation through Representation-Conditioned Diffusion Models

2026-05-26Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning
AI summary

The authors studied how to guide image-generating diffusion models using features learned by another model without needing extra labeled data. They found that using self-supervised model features to condition the diffusion model can improve image quality even when no specific prompt is given. Additionally, they showed that this approach allows for controlling image generation by changing directions in the learned feature space, leading to smooth and distinct variations in generated images. This work is an early exploration of combining diffusion models with self-supervised representation conditioning.

diffusion modelsimage generationself-supervised learningconditioningfeature representationimage editingdisentanglementsemantic controlunconditional generation
Authors
Nithesh Chandher Karthikeyan, Jonas Unger, Gabriel Eilertsen
Abstract
Diffusion models have emerged as powerful tools for high-quality image generation and editing, but guiding these models to produce specific outputs remains a challenge. Conventional approaches rely on conditioning mechanisms, such as text prompts or semantic maps, which require extensively annotated datasets. In this preliminary work, we explore diffusion models conditioned on representations from a pre-trained self-supervised model. The self-conditioning mechanism not only improves the quality of unconditional image generation, but also provides a representation space that can be used to control the generation. We explore this conditioning space by identifying directions of variations, and demonstrate promising properties in terms of smoothness and disentanglement.