MemLeak: Diagnosing Information Leaks in Multimodal Agent Memory

2026-06-29Machine Learning

Machine Learning
AI summary

The authors studied how AI systems that remember facts can still leak forgotten information through pictures, even after the text is deleted. They discovered that visual clues in images can reveal these facts, so just deleting text isn't enough. They created a new method called the Information Provenance Graph (IPG) to categorize how memories can or cannot be fully deleted. Their tests show that images and related text both contribute to leaking information, but careful content-aware deletion can greatly reduce this problem. They confirmed their findings across various AI models and used human judges to ensure reliability.

multimodal AIvisual language models (VLMs)memory deletionInformation Provenance Graph (IPG)MemLeak benchmarkcontent-aware semantic deletioninformation leakageimage-based memorydual-annotator validation
Authors
Kuan Wang, Chao Zhang
Abstract
When a multimodal AI agent is asked to forget a fact, current memory systems usually delete the text entry and report success. We find that the fact can remain recoverable from retained user images, including images tagged to entirely different facts, because VLMs use implicit visual cues at inference time. We introduce the Information Provenance Graph (IPG), a taxonomy that classifies memory representations by deletion affordance. The IPG reveals that deletion fails through multiple channels. Our benchmark, MemLeak, measures this across a deletion cascade: direct probing of deletion-capable systems yields <1%, but retained correlated text enables 18.3% recovery, and retained images enable 12.0% recovery (0.0% blind baseline, 0.3% FPR) -- with 47% of image leaks not text-recoverable. Content-aware semantic deletion reduces the image residual to 2.0%. The residual appears across multiple VLMs, a production memory system, and real Unsplash-licensed photographs. Dual-annotator human validation (kappa = 0.88) confirms judge reliability.