Revealing Artifacts via Noise Amplification: A Novel Perspective for AI-Generated Video Detection
2026-06-15 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial Intelligence
AI summaryⓘ
The authors focus on finding a way to tell apart AI-created videos from real ones, especially those made by text-to-video models that are very new and hard to detect. They noticed these AI videos miss small visual details and changes over time. To catch these differences, the authors created a method called Noise Amplification, which looks at tiny noise patterns in videos by analyzing bit-planes and makes these noises easier for a classifier to spot. They tested their method on big datasets and a new challenging benchmark they introduced, showing it works better than previous detection methods.
AI-generated videostext-to-video modelsbit-planesnoise amplificationvideo fake classificationgenerative adversarial networksspatial amplificationtemporal aggregationvideo forensicsbenchmark dataset
Authors
Renxi Cheng, Jie Gui, Hongsong Wang
Abstract
With the rapid advancement of video generation models, distinguishing between AI-generated and authentic videos has emerged as a challenging endeavor. The majority of existing research endeavors concentrate on the development of detectors for identifying samples generated by generative adversarial networks. Nevertheless, the detection of AI-generated videos, particularly those produced by text-to-video models, still remains an uncharted territory. Although state-of-the-art text-to-video models can generate realistic visual content similar to real videos, they fall short of generating the details of the images and the changes in details within the videos. Inspired by this, we address AI-generated video detection from a novel perspective of bit-planes, which can effectively describe the details or noises in images or videos. To this end, we propose a simple yet effective approach called Noise Amplification. This approach first extracts noise signals based on bit-planes, then amplifies these noise signals, and finally feeds them into the discriminator networks for video fake classification. Noise amplification is comprehensively constructed by incorporating three aspects: pixel-level intensity enhancement, region-level spatial amplification, and frame-level temporal aggregation. To evaluate methods of AI-generated video detection in challenging scenarios, we also introduce a benchmark named HardGVD. Extensive experiments on both the large-scale dataset GenVidBench and HardGVD show that our simple approach significantly outperforms state-of-the-art methods.