Real-Time Underwater Image Enhancement via Frequency-Guided Dual-Path Attention

2026-06-29Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors developed a very small and fast computer model to improve underwater photos in real time, which is important for underwater cameras and robots. Their approach uses special techniques to let the model understand both the details in the image and the patterns in different frequencies, which helps fix underwater image problems better. Their design keeps the model lightweight and efficient, running much faster than other methods while using fewer computer resources. Tests showed their model gives better picture quality compared to larger methods.

underwater image enhancementstructural re-parameterizationconvolutional neural networksfrequency domainDCT (Discrete Cosine Transform)attention mechanismsdual-path attentionlightweight modelreal-time processingimage restoration
Authors
Leshen Zhang, Ao Li, Ce Zhu
Abstract
Real-time underwater image enhancement (UIE) is crucial for mobile underwater photography and autonomous robotic systems, where practical deployment typically requires low latency and compact models under constrained computational resources. Recent ultra-lightweight CNNs based on structural re-parameterization meet these constraints but operate purely in the spatial domain, ignoring the frequency-sensitive nature of underwater degradation. To address this, we propose a lightweight UIE framework that integrates two key components: a Multi-Branch Reparameterizable Convolution with Fixed DCT Priors (MBRConv-DCT) that injects structured directional frequency priors during training, and a Frequency-Guided Dual-Path Attention (FGDPA) module that fuses spatial and spectral cues via a dual-path design for adaptive feature modulation. Both components are fully compatible with structural re-parameterization: the convolution branch introduces zero additional inference cost after re-parameterization, while the attention module incurs only a minimal computational overhead. Experiments show our model achieves state-of-the-art performance with only 4.23K parameters and 600+ FPS, outperforming much larger methods in both quantitative metrics and visual quality. Code is available at https://github.com/LethyZhang/FGDPA.