MVM-IOD: An Industrial Object-Centric Benchmark Dataset for the Evaluation of 3D Reconstruction Methods

2026-06-15 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors created a new dataset called MVM-IOD to help study 3D object reconstruction and camera pose estimation in realistic industrial settings. They took many photos of industrial objects with a robot-mounted camera moving around them, providing accurate 3D reference models and camera positions. They tested current best methods on this dataset and found that some newer, fast methods don't work as well because the images are different from their training data. However, simple image adjustments can make these methods perform better. The authors suggest being careful when using these fast methods in certain industrial cases.

3D object reconstructioncamera pose estimationindustrial roboticsdatasetStructure from MotionMulti-View Stereofeed forward methodsVisual Geometry Grounded Transformer2D Gaussian Splattingout-of-distribution

Authors

Robert Langendörfer, Markus Hillemann, Markus Ulrich

Abstract

3D object reconstruction, and camera pose estimation in industrial applications are challenging tasks, as errors are costly while the computation time is often limited. The complexity of typical industrial objects further complicates these tasks. Most of the existing datasets in this context do not depict realistic industrial scenarios. Therefore, we introduce the Machine Vision Metrology Industrial Object Dataset (MVM-IOD). Images of typical industrial objects are captured systematically, by moving a camera, mounted at the end effector of an industrial robot arm, on a hemisphere around the objects. MVM-IOD contains reference camera poses and reference 3D point clouds, the acquired RGB images of 9 objects and 2 background choices resulting in 18 scenes, which allows evaluation of all image based methods that compute a 3D reconstruction, camera poses, or novel views of a scene. Based on MVM-IOD, we extensively evaluate current SOTA 3D reconstruction and camera pose estimation methods, such as Structure from Motion, Multi-View Stereo, recent feed forward methods (Visual Geometry Grounded Transformer, π3), and 2D Gaussian Splatting and report our findings as a baseline for future research. The experiments show that capture setups like ours generate out-of distribution images for feed forward methods, leading to suboptimal point clouds and camera poses. However, these out-of-distribution images can be shifted closer to the training distribution by applying simple preprocessing steps. Consequently, in certain industrial applications, feed forward methods should be used with caution.

View PDFOpen arXiv