Towards Active Real-to-Twin Inspection: A New Paradigm for Zero-Shot Anomaly Detection

2026-05-25Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors address the problem of detecting defects during industrial inspections when only CAD models of objects are available, and fixed camera views are limiting. They introduce a new task called Real-to-Twin Anomaly Detection, which compares real-world observations directly to matching 3D CAD Digital Twins. To solve this, they propose a method named AVATAR that learns to align defect-free real images with their digital counterparts, so it can spot anomalies as things that don't match up, without needing examples of defects. Their experiments show that AVATAR works better than current methods, especially when the viewing angle changes a lot.

zero-shot anomaly detectionindustrial inspectionCAD Digital TwinsSim2Real domain gapsemantic alignmentdefect localizationviewpoint variations3D object matchingunlabeled anomaly detection
Authors
Jiaxuan Liu, Yunkang Cao, Yufeng Chen, Chunyang Li, Yuhuan Du, Hui Zhang
Abstract
The deployment of zero-shot anomaly detection (AD) in embodied industrial inspection is severely bottlenecked by its reliance on passive, fixed-viewpoint 2D imagery. Such formulations inherently fail to accommodate the active, dynamic observations required in real-world environments. To break this limitation, we introduce Real-to-Twin Anomaly Detection, a novel task that evaluates physical observations directly against geometrically matched CAD Digital Twins. To tackle this new task, we propose AVATAR, a framework designed to learn robust semantic alignment between Real and Digital Twins. By bridging benign Sim2Real domain gaps using only defect-free pairs, AVATAR effectively transforms CAD priors into dynamic, anomaly-free references. This elegant formulation enables the model to localize diverse anomalies in a zero-shot manner as unalignable deviations, eliminating the need for defect annotations. Extensive experiments demonstrate that AVATAR substantially outperforms adapted state-of-the-art baselines, exhibiting exceptional robustness to severe viewpoint variations. The code and dataset will be made publicly available.