DDStereo: Efficient Dual Decoder Transformers for Stereo 3D Road Anomaly Detection

2026-06-23Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors address two main problems in stereo-based 3D object detection: making the system fast enough for real-time use and able to recognize objects outside its training set (open-set generalization). They introduce DDStereo, a new method with two small decoder parts working together to detect objects in 2D and then estimate their 3D details efficiently. This design improves both speed and accuracy compared to previous stereo methods and is the first to achieve real-time performance similar to monocular approaches while handling new objects. Their tests show DDStereo works well on public benchmarks for both known and unknown object detection.

Stereo 3D Object DetectionReal-time PerformanceOpen-set GeneralizationTransformerDual-Decoder ArchitectureDisparity Feature ExtractorInference SpeedOpen-world Detection3D Attribute Regression
Authors
Shiyi Mu, Zichong Gu, Zhiqi Ai, Yilin Gao, Shugong Xu
Abstract
Stereo-based 3D object detection still faces two critical safety challenges: real-time performance and open-set generalization. Existing stereo 3D methods typically achieve twice the accuracy of monocular methods but suffer from significantly lower inference speeds, making them unsuitable for real-time applications. Meanwhile, recent advances in open-world detection have introduced open-set and open-vocabulary algorithms in monocular 2D and 3D settings, yet stereo-based open-set detection remains largely unexplored. To bridge this gap, we propose DDStereo, a novel Dual-Decoder Stereo Transformer for real-time open-set 3D object detection. DDStereo features two lightweight decoder branches: one for open-set foreground 2D detection and the other for 3D attribute regression. These decoders share object-level queries to achieve unified target-level alignment. To enhance inference efficiency, we designed a compact disparity feature extractor and a streamlined decoder architecture. Experiments on public stereo 3D benchmarks demonstrate that DDStereo achieves state-of-the-art accuracy under both closed-set and open-set protocols. Notably, our method surpasses existing stereo 3D detectors in inference speed and, for the first time, achieves real-time performance comparable to monocular approaches.