Contrastive Augmented Transformer with Domain-specific Enhancement for Robust Multi-scenario Metal Surface Defect Detection

2026-06-01Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors address the problem of finding tiny defects on metal surfaces, which is important for making good products but hard because of limited examples and subtle differences. They created a new method called Contrastive Augmented Transformer (CAT) that uses advanced image analysis techniques to better spot these defects at different scales. They also made the method tougher by adding special ways to mimic real-world noise and focus on tricky defect areas. Tests showed their approach works very well on a main dataset and also generalizes to other types of surface defects, suggesting it could be useful in many industrial settings.

Metal surface defect detectionSwin TransformerFeature pyramid networkContrastive lossHard negative miningData augmentationDomain adaptationAUROCMulti-scale defect detection
Authors
Yiyao Liua, Wenxiao He, Liyuan Ren, Huan Wang
Abstract
Metal surface defect detection is critical for maintaining product quality in industrial manufacturing. However, it faces significant challenges, including limited annotated data, difficulty in identifying subtle multi-scale defects, and poor generalization across diverse scenarios. To address these issues, this paper proposes a novel Contrastive Augmented Transformer (CAT) framework for robust defect detection. CAT employs a hierarchical Swin Transformer backbone and redesigns the feature pyramid network to effectively fuse low-level textures with high-level semantics, enabling precise modeling of subtle and multi-scale defect patterns. To enhance robustness under real-world noise conditions, we propose a domain-specific droplet augmentation algorithm. Furthermore, we incorporate a hard negative mining strategy into the contrastive loss to strengthen the model's discrimination ability in ambiguous defect regions. Experimental results on the KolektorSDD2 dataset demonstrate that CAT achieves a pixel-level AUROC of 99.54%, outperforming existing methods. In addition, CAT exhibits superior generalization and robustness on three unseen datasets, including KSDD1, MTD for tile defects, and MSDD for rail surface defects, demonstrating its potential for wide-scale industrial deployment.