MythraGen: Two-Stage Retrieval Augmented Art Generation Framework

2026-06-22 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors developed a new method called MythraGen to create artistic images from text descriptions. They combine searching for similar art in a big database with a way to fine-tune an image generator so it learns specific artistic styles. By using this approach, their system produces artworks that better match what the user describes. Tests with real art data show their method works better than other current tools.

text-to-image generationartistic image generationretrieval augmented generationLoRAStable Diffusionfine-tuningfeature extractionWikiArt datasetgenerative models

Authors

Quang-Khai Le, Cong-Long Nguyen, Minh-Triet Tran, Trung-Nghia Le

Abstract

Text-to-image generation has seen rapid advancements, especially with the development of generative models. However, challenges remain in achieving high-quality, contextually accurate image outputs that faithfully match the provided textual descriptions, especially in artistic generation. In this paper, we present a simple yet efficient retrieval augmented generation framework, namely MythraGen, for text-to-artistic image generation by integrating an art retrieval mechanism with LoRA-based model fine-tuning. Our method extracts features from a large-scale art dataset, optimizing the generation process by combining artist-specific styles and content. Particularly, retrieved images from an external art database that have the highest similarity to the query prompt are used to finetune Stable Diffusion using LoRA for desired art generation. Experimental results and user studies on the WikiArt dataset show that our proposed method can generate artworks that closely match the user's input, significantly outperforming existing solutions.

View PDFOpen arXiv