Uncertainty Quality of VGGT: An Analysis on the DTU Benchmark Dataset

2026-06-15Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors studied VGGT, a new neural network that can quickly create 3D models from multiple images without traditional steps like feature matching or optimization. They focused on how well VGGT estimates uncertainty, which is important for trusting and checking the quality of 3D reconstructions. Their findings show that setting a good confidence cutoff improves the reliability of these uncertainty estimates and can help make the 3D results more accurate. This work highlights that better uncertainty predictions are key to advancing fast, reliable 3D modeling.

Visual Geometry Grounded Transformerbundle adjustmentfeature matchingcamera pose estimationdepth map3D reconstructionuncertainty estimationphotogrammetryneural networksconfidence threshold
Authors
Markus Hillemann, Robert Langendörfer, Steven Landgraf, Markus Ulrich
Abstract
Visual Geometry Grounded Transformer (VGGT) has already attracted a great deal of attention in a short period of time, not least due to the Best Paper Award at CVPR-2025. Similar to DUSt3R and MASt3R, VGGT aims to bring about a paradigm shift by replacing established methods like bundle adjustment and feature matching with a simple, unified, feed-forward neural network that predicts camera poses, depth maps, and dense 3D structure directly from multiple images of a scene in a few seconds. A key aspect is its ability to process an arbitrary number of views consistently in a single forward pass without any post-processing or iterative optimization. For photogrammetry, this opens new possibilities for real-time, scalable, and accessible 3D reconstruction. In this context, not only high reconstruction accuracy but also high-quality uncertainty estimates are crucial, as they foster trust and enable robust quality assurance. This paper therefore investigates the quality of VGGT's uncertainty predictions. The analysis identifies an effective confidence threshold for filtering VGGT's raw output and demonstrates that enhancing uncertainty quality holds strong potential for improving the accuracy of its 3D reconstructions.