BEV-Denoise: Learning Intrinsic Noise for Accurate Bird's-Eye-View Semantic Segmentation

2026-06-22Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors introduce BEV-Denoise, a method to clean up noisy Bird's-Eye-View (BEV) features in autonomous vehicle perception to improve map understanding. They use a UNet-based module, inspired by diffusion models, to estimate and remove noise from the BEV features before making the final map predictions. Their training approach, called Task Decomposition, helps the noise estimator learn better by building on a pre-trained map representation. They tested their method on four different models and showed improved results on the nuScenes dataset.

Bird's-Eye-View (BEV)semantic segmentationdenoising diffusion probabilistic models (DDPM)UNetnoise estimationview transformation (VT)task decompositionautoencodernuScenes dataset
Authors
Dooseop Choi, Kyounghwan An, Kyoung-Wook Min
Abstract
In this paper, we present a framework dubbed \textbf{BEV-Denoise} that estimates and removes intrinsic noise from learned Bird's-Eye-View (BEV) features to achieve accurate BEV semantic segmentation. Inspired by the noise estimation capability of Denoising Diffusion Probabilistic Models (DDPM), we design a UNet-based noise estimation module that learns to estimate the noise from the learned BEV features. The estimated noise is then subtracted from the BEV features and fed to BEV map decoders for the final prediction results. To facilitate supervision for the noise estimation module, we follow a sequential learning paradigm called Task Decomposition (TD) where a pre-trained BEV map autoencoder is employed to train a view transformation (VT) encoder. We share three key insights learned from our intensive experiments that are critical for improved performance. We apply our framework to four existing models, encompassing the three major VT paradigms. Experimental results on a large-scale real-world dataset, nuScenes, demonstrate the effectiveness of our framework.