LoCC: Detection and Localization of Lip-Syncing Deepfakes via Counterfactual Frame Consistency

2026-06-22 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors created a new method called LoCC to spot lip-syncing deepfake videos, which are tricky because the fake parts mainly happen around the mouth over time. Instead of looking at whole videos at once, their method checks if each frame matches what it expects based on nearby frames. Real videos show steady mouth movements, while deepfakes have small mismatches. Using a special learning approach, their method detects these mismatches better than previous ones on several test sets, even when videos are compressed or from different sources.

lip-sync deepfaketemporal modelingspatial modelingframe-level detectioncounterfactual estimationteacher-student learningvideo compressionbenchmark datasetsdeepfake detectionlocalization

Authors

Soumyya Kanti Datta, Shan Jia, Siwei Lyu

Abstract

Lip-syncing deepfakes are among the most challenging forms of manipulated media because their artifacts are localized almost exclusively to the mouth region and evolve dynamically over time. Detecting such deepfakes requires precise temporal and spatial modeling of lip motion. In this paper, we propose LoCC, a novel detection framework that performs fine-grained detection and localization of lip-syncing deepfakes at both segment and frame levels. Unlike prior approaches that analyze videos holistically, our method evaluates whether each frame aligns with a counterfactual estimate generated from its temporal neighbors. Real videos exhibit strong and stable consistency, whereas lip-sync deepfakes introduce localized inconsistencies. Following a teacher-student learning paradigm, our model effectively captures these frame-level discrepancies and achieves superior performance over state-of-the-art methods on multiple benchmark lip-syncing deepfake datasets, including LAV-DF, AVDF1M, FakeAVCeleb, and KODF, and generalizes well across compression levels and datasets.

View PDFOpen arXiv