Training-Free Multi-Concept LoRA Composition with Prompt-Aware Weighting

2026-06-02Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning
AI summary

The authors address the problem of combining multiple personalized visual styles in text-to-image AI models using Low-Rank Adaptation (LoRA). They found that simply mixing these styles often causes interference, hurting image quality and accuracy. To fix this, they created two new methods, W-Switch and W-Composite, that smartly weight each style based on how important its associated words are in the prompt. They also introduced better ways to check if generated images look like the intended real images and tested their approach, showing it improves quality and keeps features from all concepts well.

Low-Rank Adaptation (LoRA)diffusion modelstext-to-image generationmulti-concept customizationprompt tokenssemantic weightingimage fidelityidentity preservationcompositionalityevaluation metrics
Authors
Georgios Tsoumplekas, Stella Bounareli, Vasileios Argyriou
Abstract
Low-Rank Adaptation (LoRA) successfully enables personalization in text-to-image generation by adapting pre-trained diffusion models to specific visual concepts and styles. However, extending such models to multi-concept customization remains challenging. Naively combining multiple LoRA weights or their outputs often leads to interference among concepts, resulting in degraded visual quality and reduced fidelity to the reference images of individual concepts. This paper proposes a simple yet effective approach for multi-concept customization by optimally combining the outputs of multiple LoRA modules. We leverage the relative importance of each concept during generation, as inferred from its corresponding prompt tokens and introduce two methods, W-Switch and W-Composite, that employ a prompt-aware importance weighting strategy in which each LoRA is weighted according to the semantic influence of its trigger words in the target prompt. In addition, we extend existing quantitative evaluation metrics by proposing a new image-based similarity evaluation framework that assesses image fidelity and identity preservation through comparisons between real-world reference images and automatically segmented concept regions from generated images. We evaluate our approach on the ComposLoRA testbed and demonstrate consistent improvements over existing state-of-the-art methods in terms of visual quality, identity preservation and compositionality. Qualitative evaluations, including a Large Language Model (LLM) based assessment and a user study, further validate the effectiveness of the proposed methods and align with the newly introduced quantitative image-based metrics. Our code is available at https://github.com/GeorgeTsoumplekas/Prompt-Aware-Multi-LoRA-Composition.