ImageAuditor: Membership Inference Attack against Image-based Retrieval-Augmented Generation

2026-06-02Cryptography and Security

Cryptography and Security
AI summary

The authors study how to find out if specific images are included in big, mixed-media databases used by systems that generate images based on text or answer questions using images. Previous methods worked well for text-based systems but struggled with image-based setups because images can’t be easily searched using text queries and image generators produce pictures, not text answers. To solve this, the authors created ImageAuditor, a method that breaks the problem into two parts: finding the image in the database and extracting membership information. They developed special techniques to search across image and text formats and to interpret the generator’s output, achieving good accuracy with only a few queries.

Image-based Retrieval-Augmented GenerationMembership Inference AttacksCross-modal RetrievalText-to-Image GenerationPolicy OptimizationReward-Guided OptimizationK-means ClusteringAUROCPrompting Strategy
Authors
Jinghuai Zhang, Pengyue Yu, Zhexiao Lin, Kunlin Cai, Fnu Suya, Yuan Tian
Abstract
Image-based Retrieval-Augmented Generation (IRAG) conditions a frozen generator on reference images retrieved from an external database, supporting both text-to-image (T2I) and question answering (Q&A) tasks. Because these databases are opaque and web-scraped, copyright holders need ways to audit whether specific images appear in them. While prior work employs membership inference attacks (MIAs) to audit uni-modal, text-based RAG, they fail to transfer to IRAG due to two key challenges. First, cross-modal retrieval: text-RAG MIAs force retrieval of the target passage by injecting its content into the query, which is unavailable in IRAG since images cannot be embedded into text queries; even accurate image captions fail to bridge the modality gap. Second, discriminative signal extraction: text-RAG MIAs extract membership signals by prompting the generator to answer multiple questions over the target passage, whereas T2I generators in IRAG produce images rather than follow Q&A commands. To fill this gap, we introduce the first MIA tailored to IRAG, ImageAuditor, which decomposes each attack query into a retrieval segment and an extraction segment, enabling dedicated optimization for each challenge. For retrieval, we propose Reward-Guided Policy Optimization (RGPO), which updates a stochastic policy from reward-ranked candidates to navigate the cross-modal embedding landscape and admits finite-sample optimality guarantees to balance exploration and exploitation. For extraction, we analyze the distribution of the MIA score to guide the co-design of the prompting strategy and scoring rule, and derive task-specific instantiations for T2I and Q&A tasks. We aggregate signals across queries via K-means clustering for reliable membership decisions. Across various IRAG systems, ImageAuditor exceeds 80% AUROC with only four queries per audited image and remains robust across diverse settings.