MUSDA: Multi-source Multi-modality Unsupervised Domain Adaptive 3D Object Detection for Autonomous Driving
2026-05-11 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors developed a new way to improve 3D object detection for self-driving cars by using multiple labeled datasets from different sources and combining information from both cameras and LiDAR sensors, without needing new labels for the target environment. Their method uses special classifiers that align features from different sensors at multiple levels and creates a graph-based system to merge predictions from all sources effectively. They tested their framework on popular datasets and found it consistently outperformed existing methods. This approach helps adapt 3D detection models to new environments more easily and accurately.
3D object detectionautonomous drivingdomain adaptationmulti-modalityLiDARcamera sensorsunsupervised learningprototype graphmulti-source fusionfeature alignment
Authors
Xiaohu Lu, Hamed Khatounabadi, Hayder Radha
Abstract
With the advancement of autonomous driving, numerous annotated multi-modality datasets have become available. This presents an opportunity to develop domain-adaptive 3D object detectors for new environments without relying on labor-intensive manual annotations. However, traditional domain adaptation methods typically focus on a single source domain or a single modality, limiting their effectiveness in multi-source, multi-modality scenarios. In this paper, we propose a novel framework for multi-source, multi-modality unsupervised domain adaptation in 3D object detection for autonomous driving. Given multiple labeled source domains and one unlabeled target domain, our framework first introduces hierarchical spatially-conditioned (HSC) domain classifiers, which jointly align features from both camera and LiDAR modalities at two distinct levels for each source-target domain pair. To effectively leverage information from multiple source domains, we construct a prototype graph between each pair of domains. Based on this, we develop a prototype graph weighted (PGW) multi-source fusion strategy to aggregate predictions from multiple source detection heads. Experimental results on three widely used 3D object detection datasets - Waymo, nuScenes, and Lyft - demonstrate that our proposed framework effectively integrates information across both modalities and source domains, consistently outperforming state-of-the-art methods.