DrivingVoxels: Compositional Sparse Voxel Rasterization for Dynamic Driving Scene Reconstruction

2026-06-22 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors created DrivingVoxels, a new way to show 3D scenes in driving videos that have moving cars and other objects. They use separate 3D voxel maps (octrees) for each moving object and the background, making the process faster and less memory-heavy. Unlike previous methods, they do not rely on neural networks but use LiDAR data to get a good starting point for building the scene. Their method works well on a standard test and trains quicker than some older techniques.

Sparse voxelsOctree3D Gaussian SplattingLiDARNeural-free representationDynamic scenesNeural renderingNovel view synthesisPandaSet benchmark3D reconstruction

Authors

Tania Aguirre, Luis Roldão, Moussab Bennehar, Nathan Piasco, Dzmitry Tsishkou, Simone Rossi, Pietro Michiardi

Abstract

Reconstructing dynamic urban scenes remains challenging due to the unbounded nature of driving environments and the presence of multiple dynamic objects. Currently, potentially faster sparse voxel methods are mainly designed for static scenarios. On the other hand, dynamic approaches based on 3D Gaussian Splatting, despite their high-fidelity, are often time-consuming for driving scenarios and exhibit uncontrollable memory growth in large scenes. To address these limitations, we present DrivingVoxels, a compositional sparse voxel rendering framework for dynamic driving scenes. Our method jointly rasterizes sparse voxels from multiple independent octrees within a single rendering pass. Each rigid dynamic object is represented by an octree defined in its local coordinate frame, while a separate static octree models the stationary background. DrivingVoxels adopts a fully explicit, neural-free representation together with a LiDAR-guided structural initialization that efficiently captures scene geometry. We evaluate our framework on the PandaSet benchmark, demonstrating that DrivingVoxels performs on par on perceptual metrics and better on structural metrics for NVS and reconstruction while requiring shorter training times than previous 3DGS-base methods to an efficient optimization workflow anchored by a strong LiDAR prior.

View PDFOpen arXiv