Incremental Semantics-Aided Meshing from LiDAR-Inertial Odometry and RGB Direct Label Transfer
2026-04-10 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionRobotics
AI summaryⓘ
The authors developed a method that combines camera images and LiDAR scans to create detailed 3D models of large indoor spaces. Their approach uses a smart labeling system that helps fix gaps and errors often found in LiDAR data by adding extra information from pictures. This process gradually builds a high-quality 3D mesh that preserves shapes better than previous methods. Tests show it works well on complex buildings, making the models more useful for virtual reality and digital design.
LiDARInertial Odometry3D Mesh ReconstructionRGB ImagingSemantic LabelingTruncated Signed Distance Function (TSDF)Marching CubesPoint CloudGeometric DriftUniversal Scene Description (USD)
Authors
Muhammad Affan, Ville Lehtola, George Vosselman
Abstract
Geometric high-fidelity mesh reconstruction from LiDAR-inertial scans remains challenging in large, complex indoor environments -- such as cultural buildings -- where point cloud sparsity, geometric drift, and fixed fusion parameters produce holes, over-smoothing, and spurious surfaces at structural boundaries. We propose a modular, incremental RGB+LiDAR pipeline that generates incremental semantics-aided high-quality meshes from indoor scans through scan frame-based direct label transfer. A vision foundation model labels each incoming RGB frame; labels are incrementally projected and fused onto a LiDAR-inertial odometry map; and an incremental semantics-aware Truncated Signed Distance Function (TSDF) fusion step produces the final mesh via marching cubes. This frame-level fusion strategy preserves the geometric fidelity of LiDAR while leveraging rich visual semantics to resolve geometric ambiguities at reconstruction boundaries caused by LiDAR point-cloud sparsity and geometric drift. We demonstrate that semantic guidance improves geometric reconstruction quality; quantitative evaluation is therefore performed using geometric metrics on the Oxford Spires dataset, while results from the NTU VIRAL dataset are analyzed qualitatively. The proposed method outperforms state-of-the-art geometric baselines ImMesh and Voxblox, demonstrating the benefit of semantics-aided fusion for geometric mesh quality. The resulting semantically labelled meshes are of value when reconstructing Universal Scene Description (USD) assets, offering a path from indoor LiDAR scanning to XR and digital modeling.