Failure-Aware Refinement of Vision-Language Model for Lithography Defect Detection

2026-06-08Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors developed a two-step method to find tiny flaws in semiconductor patterns, like unwanted bridges or dirt. First, they trained a model called Qwen3-VL to spot and label these defects in images. Since the first step can make mistakes, the authors added a second step that learns from these errors and corrects them. This two-step approach helps the system make fewer mistakes compared to just one-step training.

Semiconductor lithographyDefect detectionVision-language modelQwen3-VLLoRA fine-tuningBounding boxesFalse positivesError refinementPattern defects
Authors
Pangyun Jeong, Jiyeong Kong, Yuehua Hu, Dohee Jeong, Kyung-Tae Kang
Abstract
Semiconductor lithography inspection requires reliable detection of small pattern defects such as bridge, burr, pinch, and contamination. In this study, we propose a two-stage vision-language framework that combines initial defect detection with prediction refinement. In the first stage, Qwen3-VL is fine-tuned with LoRA as a vision-language adapter to predict defect counts, defect categories, and normalized bounding boxes from lithography images. However, direct fine-tuning may still produce common test-time errors, including false positives, missed defects, and incorrect defect types. To address this limitation, the second stage trains a refinement module using first-stage prediction failures and their corrected labels, allowing the model to review and revise initial outputs. By learning from cases where the initial adapter fails, the refinement process improves defect inference beyond single-stage fine-tuning.