Visual Prompting Meets Feature Reconstruction-Based Anomaly Detection with Dual-Teacher Supervision

2026-06-08 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors point out that current anomaly detection methods work great on some standard datasets but struggle when conditions like object size, angle, or background change, which happens a lot in real life. To fix this, they developed three improvements: isolating objects using masking, allowing a model component called the teacher to learn better in new situations, and creating fake images to train the system more effectively. Using these ideas, they improved results on a difficult dataset named AeBAD. Their work makes anomaly detection more reliable when real-world variations occur.

Anomaly DetectionForeground-Background MaskingStudent-Teacher ModelDomain AdaptabilityData AugmentationSynthetic ImagesDiffusion ModelsMasked Multiscale Reconstruction (MMR)AeBAD Dataset

Authors

Mateo Diaz-Bone, Daniel Caraballo, Florian Scheidegger, Thomas Frick, Mattia Rigotti, Andrea Bartezzaghi, Roy Assaf, Niccolo Avogaro, Yagmur G. Cinar, Brown Ebouky, Filip M. Janicki, Piotr S. Kluska, Cezary Skura, Cristiano Malossi

Abstract

Recent Anomaly Detection methods achieve perfect detection and segmentation scores on well-established datasets, such as MVTec. However, many of these methods face challenges when foundational assumptions - such as consistent object scale, viewpoint, background, illumination, and centered placement - are violated. Those variations that occur render anomaly detection methods unusable in many real-world scenarios. To address these limitations, we introduce three key contributions: (1) a visual prompting pipeline that isolates objects using foreground-background masking; (2) a mechanism for unfreezing the teacher in student-teacher models to improve domain adaptability; and (3) a data augmentation strategy leveraging diffusion-generated synthetic images to enhance anomaly detection performance. We achieve a 3.5 percentage point improvement over the previous state-of-the-art on the challenging AeBAD dataset by using the Masked Multiscale Reconstruction (MMR) model as our backbone.

View PDFOpen arXiv