RT-SDGOD: Real-Time Single-Domain Generalized Object Detection
2026-06-08 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors study how real-time object detectors can still work well when the environment changes, like different weather or lighting, without making the detection slower. They identify that these detectors miss more objects because the clues they rely on become less clear and stable. To fix this, they create a new training method where multiple detection queries work together to gather better and more stable clues about each object. Their approach improves detector performance on new, unseen environments without slowing down detection during real use.
real-time object detectiondomain shiftDETRrepresentation learningquery groupsone-to-many supervisiondiscriminative evidencecross-domain generalizationtraining-time methods
Authors
Yupeng Zhang, Fangzhuo Gao, Ruize Han, Wei Feng, Liang Wan
Abstract
In real-world deployment under strict real-time constraints, weather and imaging variations induce significant distribution shifts, severely degrading detectors. Single-Domain Generalized Object Detection aims to mitigate this issue, yet existing methods rarely investigate-at the level of problem formulation-the generalization capability of real-time detectors under such constrained inference budgets. To this end, we introduce Real-Time Single-Domain Generalized Object Detection (RT-SDGOD), which focuses on how real-time detectors can achieve cross-domain generalization under zero extra inference overhead by relying solely on training-time representation learning. We observe that, under domain shift, DETR-based real-time detectors mainly degrade through increased missed detections, rooted in limited and unstable object-level discriminative evidence. Based on this, we propose RT-SDGDet, a multi-evidence collaborative modeling framework for RT-SDGOD. The core idea is to enable multiple queries of the same object to collaboratively cover more sufficient discriminative evidence while maintaining the stability of such evidence modeling across views. Specifically, we use one-to-many (O2M) supervision to construct stable object-specific query groups, and further design Discriminative Evidence Diversity Learning (DEDL) and Dual-view Evidence Consistency Learning (DvECL) to expand object-level evidence coverage and improve evidence stability under appearance perturbations, respectively. Since all components are introduced only during training, our method incurs no extra inference overhead. Extensive experiments show that the proposed method achieves better generalization performance than existing approaches across multiple unseen target domains.