Lightweight Neural Framework for Robust 3D Volume and Surface Estimation from Multi-View Images

2026-06-22Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors developed a fast and efficient method to estimate the size and surface area of objects using images from different angles. Their approach combines 3D point clouds and 2D image features using a graph-based model, avoiding slow iterative calculations. Tests on tasks like coral monitoring and body measurement show their method works well even with few or noisy images, making it useful for many practical situations. This new technique offers a scalable solution for shape analysis from visual data.

volume estimationsurface area estimationmulti-view images3D point cloudgraph-based decoderscale normalizationiterative optimizationshape analysissparse datafeed-forward framework
Authors
Diego E. Farchione, Ramzi Idoughi, Peter Wonka
Abstract
Accurate volume and surface area estimation is critical for diverse applications, from marine ecology to medical diagnostics. However, existing methods often suffer from high computational costs and poor performance with sparse and noisy data. We propose a fully feed-forward framework that regresses scale-normalized volume and surface area and their associated uncertainties directly from multi-view images. By fusing 3D point cloud reconstructions with view-aligned 2D features through a graph-based decoder, our model bypasses iterative optimization, ensuring exceptional scalability and rapid inference. Experimental results demonstrate that our approach outperforms state-of-the-art methods, particularly when operating with a low number of input images. Validated across coral monitoring, dietary analysis, and anthropometry, our proposed framework provides a robust, adaptable solution for quantitative shape analysis. This architecture provides a high-speed, scalable alternative for precise geometric estimation from visual data, maintaining high performance even in resource-constrained or sparse-view scenarios.