Pixel-Level Pavement Distress Assessment Using Instance Segmentation

2026-05-25Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors developed a system to detect and precisely outline different types of cracks and potholes in road images using a method called Mask R-CNN. They tested this system on a special dataset of road pictures taken with a phone and found that the best model accurately identified cracks with over 87% overall accuracy. They also compared their approach to a popular object detector called YOLO, which performed much worse on this task. Their findings suggest that detailed image segmentation is better suited for assessing road damage, but there are still challenges like inconsistent labeling and dealing with rare types of cracks.

Mask R-CNNinstance segmentationpavement distresscrack detectionobject detectionResNet-101 FPNYOLOprecision and recallannotated datasetimage segmentation
Authors
Logan Dewick, Bibesh Pyakurel, Kong Pheng Yang, Nazim Choudhury, M. G. Sarwar Murshed
Abstract
Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for maintenance-relevant quantification. This paper presents a vision-based pavement distress analysis system based on Mask R-CNN instance segmentation and evaluates it on UWGB-StreetCrack, a custom field-collected roadway image dataset acquired with a vehicle-mounted smartphone and manually annotated with polygon labels for longitudinal cracks, transverse cracks, alligator cracks, and potholes. Five Detectron2-based Mask R-CNN backbone variants were considered under a consistent fine-tuning protocol. The best-performing model, Mask R-CNN with a ResNet-101 FPN backbone, achieved 84.23% precision, 90.04% recall, and an F1 score of 87.04% under the project-specific bounding-box matching protocol. The same model produced an aggregate predicted crack-area fraction of 2.164%, closely matching the 2.170% ground-truth crack-area fraction. To contextualize the segmentation system against a detector-oriented alternative, a CSPDarknet53-based YOLO detector was also adapted and retrained on the dataset, reaching 27.5% precision and 20.7% recall on the validation protocol. The results show that instance segmentation is a practical direction for field pavement imagery and aggregate crack-area estimation, while also exposing open challenges in annotation consistency, class imbalance, confounder rejection, and mask-level benchmarking.