Rethinking Forgery Attacks on Semantic Watermarks in Black-Box Settings: A Geometric Distortion Perspective
2026-06-29 • Cryptography and Security
Cryptography and SecurityComputer Vision and Pattern Recognition
AI summaryⓘ
The authors studied how hidden messages (semantic watermarks) in AI image generators can be copied or faked by outsiders without knowing the inner workings (black-box attacks). They found that due to differences between the copying model and the original, there is a minimum amount of error that cannot be avoided, which changes the hidden message in predictable ways rather than random noise. Using this insight, they created a new way to detect fake watermarked images before checking the watermark itself. Their tests show this method works well in various tricky situations while still handling normal image changes.
semantic watermarklatent diffusion modelsblack-box attackrate-distortion theorylatent spacemanifoldproxy modeltarget modelgeometric deviationforgery detection
Authors
Cheng-Yi Lee, Yichi Zhang, Yuchen Yang, Chun-Shien Lu, Jun-Cheng Chen
Abstract
Recent studies have shown that semantic watermarks, which embed information into the initial noise of latent diffusion models (LDMs), are vulnerable to black-box forgery attacks. However, existing methods primarily rely on empirical evidence and lack a rigorous theoretical understanding of the conditions under which such attacks succeed or fail. To bridge this gap, we rethink the nature of such attacks through the lens of rate-distortion in the latent space. Our analysis identifies an irreducible distortion floor due to structural mismatches between proxy and target models, which fundamentally limits the fidelity of forged watermarks. We further characterize this distortion as structured geometric deviations on the latent manifold, in the form of global drift and local deformation rather than stochastic noise. Leveraging these insights, we propose a scheme-agnostic detection method that distinguishes forged samples before watermark verification. Extensive experiments demonstrate the effectiveness of our method across diverse black-box scenarios, while preserving robustness to common distortions.