Impostor: An Agent-Curated Benchmark for Realistic AIGC Manipulation Localization

2026-06-03Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors created a new large dataset called Impostor with 100,000 AI-edited images to help detect and locate image manipulations better. They used a system named CraftAgent to automatically produce realistic and varied image edits from different AI models. They also developed a new detection method called PhaseAware-Net (PANet) that improves finding tricky manipulated areas by combining detailed image analysis and semantic understanding. Their tests show that Impostor is hard for current detection tools, but PANet performs well on it and other datasets.

image manipulationgenerative AIdatasetimage forensicsmanipulation localizationAI-generated content (AIGC)semantic analysisphase modelingcomputer visionmachine learning
Authors
Zhenliang Li, Yutao Hu, Qixiong Wang, Wenpeng Du, Hongxiang Jiang, Jiasong Wu, Xiaolong Jiang, Jungong Han
Abstract
Recent advances in generative image editing have improved the realism and controllability of localized image manipulation, raising new challenges for image manipulation detection and localization (IMDL). However, existing IMDL benchmarks still have limitations in visual realism, manipulation diversity, and generator coverage, making it difficult to reflect recent trends in image manipulation. To address these limitations, we introduce Impostor, a high-quality AI-edited image manipulation localization dataset containing 100K manipulated images. Impostor is constructed by CraftAgent, a closed-loop agent framework that integrates scene perception, editing planning, manipulation execution, quality validation, and iterative reflection to automatically generate diverse and visually realistic manipulated images. Moreover, Impostor contains images generated by seven recent AIGC models across three manipulation types and includes multiple manipulated regions, providing a more comprehensive benchmark for AIGC-based IMDL. Furthermore, we propose PhaseAware-Net (PANet), a semantic-forensic framework that introduces local phase modeling and semantic-forensic consistency learning to better localize semantically plausible yet forensically disrupted manipulated regions. Extensive experiments show that Impostor poses significant challenges to existing large vision-language models (LVLMs) and specialized IMDL methods, while PANet achieves superior performance on Impostor and multiple public benchmarks.