Increasing the Efficiency of DETR for Maritime High-Resolution Images

2026-05-11Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionRobotics
AI summary

The authors focus on detecting objects in the sea to help unmanned boats navigate safely. They address problems like spotting small or distant objects without slowing down or needing too much computer power. Their method uses a special model called Vision Mamba that processes images efficiently by breaking them into parts and ignoring unimportant background areas. This makes detection faster and more accurate compared to older methods like RT-DETR with ResNet50.

maritime object detectionunmanned surface vesselsVision Mamba (ViM)State Space Models (SSMs)Feature Pyramid Networktokenizationtoken pruninghigh-resolution imageryRT-DETRResNet50
Authors
Tinsae Yehuala, Hao Cheng, Ville Lehtola
Abstract
Maritime object detection is critical for the safe navigation of unmanned surface vessels (USVs), requiring accurate recognition of obstacles from small buoys to large vessels. Real-time detection is challenging due to long distances, small object sizes, large-scale variations, edge computing limitations, and the high memory demands of high-resolution imagery. Existing solutions, such as downsampling or image splitting, often reduce accuracy or require additional processing, while memory-efficient models typically handle only limited resolutions. To overcome these limitations, we leverage Vision Mamba (ViM) backbones, which build on State Space Models (SSMs) to capture long-range dependencies while scaling linearly with sequence length. Images are tokenized into sequences for efficient high-resolution processing. For further computational efficiency, we design a tailored Feature Pyramid Network with successive downsampling and SSM layers, as well as token pruning to reduce unnecessary computation on background regions. Compared to state-of-the-art methods like RT-DETR with ResNet50 backbone, our approach achieves a better balance between performance and computational efficiency in maritime object detection.