LPLCv2: An Expanded Dataset for Fine-Grained License Plate Legibility Classification

2026-04-09 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors improved a dataset used for recognizing car license plates, making it over three times larger and fixing previous mistakes in the data. They added detailed labels about the plates and vehicles, as well as information about the cameras and weather conditions during capture. They also developed a new training method that helped their recognition model perform much better than before. In addition, they tested how mixing images from the same cameras in training and testing affects results, finding only a small effect. All their data and code are freely available online.

Automatic License Plate RecognitionDataset annotationExponential Moving AverageF1-scoreTraining procedureCamera contaminationBounding boxesMachine learningBenchmarkImage capture conditions

Authors

Lucas Wojcik, Eduardo A. F. Machoski, Eduil Nascimento, Rayson Laroca, David Menotti

Abstract

Modern Automatic License Plate Recognition (ALPR) systems achieve outstanding performance in controlled, well-defined scenarios. However, large-scale real-world usage remains challenging due to low-quality imaging devices, compression artifacts, and suboptimal camera installation. Identifying illegible license plates (LPs) has recently become feasible through a dedicated benchmark; however, its impact has been limited by its small size and annotation errors. In this work, we expand the original benchmark to over three times the size with two extra capture days, revise its annotations and introduce novel labels. LP-level annotations include bounding boxes, text, and legibility level, while vehicle-level annotations comprise make, model, type, and color. Image-level annotations feature camera identity, capture conditions (e.g., rain and faulty cameras), acquisition time, and day ID. We present a novel training procedure featuring an Exponential Moving Average-based loss function and a refined learning rate scheduler, addressing common mistakes in testing. These improvements enable a baseline model to achieve an 89.5% F1-score on the test set, considerably surpassing the previous state of the art. We further introduce a novel protocol to explicitly addresses camera contamination between training and evaluation splits, where results show a small impact. Dataset and code are publicly available at https://github.com/lmlwojcik/LPLCv2-Dataset.

View PDFOpen arXiv