Failure-Aware Refinement of Vision-Language Model for Lithography Defect Detection

2026-06-08 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors developed a two-step method to find tiny flaws in semiconductor patterns, like unwanted bridges or dirt. First, they trained a model called Qwen3-VL to spot and label these defects in images. Since the first step can make mistakes, the authors added a second step that learns from these errors and corrects them. This two-step approach helps the system make fewer mistakes compared to just one-step training.

Semiconductor lithographyDefect detectionVision-language modelQwen3-VLLoRA fine-tuningBounding boxesFalse positivesError refinementPattern defects

Authors

Pangyun Jeong, Jiyeong Kong, Yuehua Hu, Dohee Jeong, Kyung-Tae Kang

Abstract

Semiconductor lithography inspection requires reliable detection of small pattern defects such as bridge, burr, pinch, and contamination. In this study, we propose a two-stage vision-language framework that combines initial defect detection with prediction refinement. In the first stage, Qwen3-VL is fine-tuned with LoRA as a vision-language adapter to predict defect counts, defect categories, and normalized bounding boxes from lithography images. However, direct fine-tuning may still produce common test-time errors, including false positives, missed defects, and incorrect defect types. To address this limitation, the second stage trains a refinement module using first-stage prediction failures and their corrected labels, allowing the model to review and revise initial outputs. By learning from cases where the initial adapter fails, the refinement process improves defect inference beyond single-stage fine-tuning.

View PDFOpen arXiv