AI summaryⓘ
The authors created WUTDet, a very large and diverse ship detection dataset with over 100,000 images and 380,000 labeled ships, to improve detecting ships in different maritime conditions like ports and bad weather. They tested 20 popular detection models using CNNs, Transformers, and Mamba architectures on this dataset. Their results show that Transformers are best at overall accuracy and spotting small ships, CNNs are faster for real-time use, and Mamba offers a good middle ground. They also made a special test set called Ship-GEN to check if models trained on WUTDet work well on other data, finding that WUTDet-trained models generalize better. This dataset supports better research and evaluation of ship detection in complex water environments.
Ship detectionDatasetComputer visionCNNTransformerMamba architectureMaritime environmentsSmall-object detectionGeneralizationInference efficiency
Authors
Junxiong Liang, Mengwei Bao, Tianxiang Wang, Xinggang Wang, An-An Liu, Ryan Wen Liu
Abstract
Ship detection for navigation is a fundamental perception task in intelligent waterway transportation systems. However, existing public ship detection datasets remain limited in terms of scale, the proportion of small-object instances, and scene diversity, which hinders the systematic evaluation and generalization study of detection algorithms in complex maritime environments. To this end, we construct WUTDet, a large-scale ship detection dataset. WUTDet contains 100,576 images and 381,378 annotated ship instances, covering diverse operational scenarios such as ports, anchorages, navigation, and berthing, as well as various imaging conditions including fog, glare, low-lightness, and rain, thereby exhibiting substantial diversity and challenge. Based on WUTDet, we systematically evaluate 20 baseline models from three mainstream detection architectures, namely CNN, Transformer, and Mamba. Experimental results show that the Transformer architecture achieves superior overall detection accuracy (AP) and small-object detection performance (APs), demonstrating stronger adaptability to complex maritime scenes; the CNN architecture maintains an advantage in inference efficiency, making it more suitable for real-time applications; and the Mamba architecture achieves a favorable balance between detection accuracy and computational efficiency. Furthermore, we construct a unified cross-dataset test set, Ship-GEN, to evaluate model generalization. Results on Ship-GEN show that models trained on WUTDet exhibit stronger generalization under different data distributions. These findings demonstrate that WUTDet provides effective data support for the research, evaluation, and generalization analysis of ship detection algorithms in complex maritime scenarios. The dataset is publicly available at: https://github.com/MAPGroup/WUTDet.