Joint Multi-Camera LiDAR Extrinsic Calibration via Learned Pairwise Initialization and Geometric Refinement
2026-05-29 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors noticed that most methods for aligning cameras and LiDAR sensors treat each camera separately, which can lead to good individual results but poor overall system agreement. They created a two-step method: first, a neural network called CMRNext estimates each camera’s position independently and finds matching points between images and LiDAR data. Then, these separate estimates are refined together to make sure all cameras align in a consistent way. Their method improved accuracy and consistency on standard datasets, especially when the initial camera estimates were unreliable.
Camera-LiDAR calibrationExtrinsic calibrationMulti-camera systemsNeural networks2D-3D correspondencesBundle adjustmentReprojection errorRelative poseKITTI datasetWalkley dataset
Authors
Aziz Al-Najjar, Marzieh Amini, James R. Green, Felix Kwamena
Abstract
Most learning-based camera-LiDAR calibration methods treat each camera-LiDAR pair independently, ignoring the rigid geometric coupling in multi-camera platforms. As a result, per-camera estimates may be individually accurate yet inconsistent at the system level. We present a two-stage framework for joint multi-camera LiDAR extrinsic calibration that combines learned pairwise matching with geometric refinement. First, CMRNext is applied independently to each camera to produce initial extrinsic estimates and dense 2D-3D correspondences. These predictions are then jointly refined through a multi-frame bundle adjustment with reprojection, per-camera prior, and relative-pose prior terms. This approach converts pairwise predictions into a globally consistent multi-camera calibration. Experiments on KITTI (in-domain for CMRNext) and Walkley (out-of-domain) datasets show improved per-camera accuracy and inter-camera consistency. On KITTI, the method achieves 0.89 cm translation error and 0.038 rotation error. On Walkley, it reduces translation error from 108.6 cm to 3.1 cm, highlighting the benefit of explicit multi-camera coupling when single-camera predictions are less reliable.