Semantically-Aware Diver Activity Recognition Framework for Effective Underwater Multi-Human-Robot Collaboration

2026-06-10Robotics

RoboticsComputer Vision and Pattern Recognition
AI summary

The authors created DAR-Net, a new AI system that helps underwater robots understand what human divers are doing by analyzing video scenes. Their approach uses a special training method to recognize activities even when visibility is low, combining big-picture activity understanding with detailed local information. They also made a new dataset with many labeled underwater diver images to help train and test such systems. Tests showed DAR-Net works better than previous methods at identifying six different diver actions, which can improve teamwork between humans and robots underwater.

autonomous underwater vehiclestransformer modelshuman-robot collaborationactivity recognitionsemantic segmentationmulti-loss trainingunderwater roboticsdatasetlow-visibility conditions
Authors
Sadman Sakib Enan, Junaed Sattar
Abstract
Effective multi-human-robot collaboration is essential for expanding human-led operations in the challenging and high-risk underwater environment. For autonomous underwater vehicles (AUVs) to become true teammates, they must be able to comprehend their surroundings and recognize a diver's activities to offer assistance and ensure safety. Towards this goal, we introduce DAR-Net, a novel transformer-based framework that analyzes complex underwater scenes to classify diver activities. Our contribution lies in a semantically guided learning formulation that couples transformer-based temporal reasoning with pixel-level scene supervision. This multi-loss training strategy explicitly aligns global activity recognition with local human-robot interaction semantics, which is particularly critical in low-visibility underwater conditions. To address the significant challenge of data scarcity in this domain, we present the first-ever Underwater Diver Activity (UDA) dataset, a foundational resource containing over 2,600 annotated images with pixel-level masks. Through rigorous experimental evaluations in a controlled environment, we demonstrate that DAR-Net achieves promising accuracy in recognizing six distinct diver activities, outperforming state-of-the-art models. While this dataset provides a crucial baseline, our work serves as a pioneering step, laying the groundwork for future research and facilitating the development of more intelligent, collaborative underwater robotic systems.