GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction
2026-05-22 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors developed a new method to create very detailed 3D models of indoor scenes using multiple regular photos taken from different angles. They broke down the scene into small overlapping chunks and used a strong 3D shape generator called Trellis.2 to reconstruct the entire space. Their innovation includes a way to combine information from different views into a unified 3D form that stays consistent and accurate regardless of how the photos are ordered. This approach results in more faithful and editable 3D models of rooms, performing better than previous best methods by 16%.
3D scene reconstructionmulti-view RGB imagesgenerative 3D priorTrellis.2projection-based conditioningmulti-view consistencyPBR meshindoor environmentsconditional 3D generation
Authors
Katharina Schmid, Nicolas von Lützow, Jozef Hladký, Angela Dai, Matthias Nießner
Abstract
We introduce a new approach to high-fidelity 3D scene reconstruction from multi-view RGB images that tightly couples reconstruction with a strong generative 3D prior. We cast scene reconstruction as conditional 3D generation over a set of spatially-localized, overlapping chunks that together tile the scene, scaling generation to large scene extents. Crucially, we inherit the fidelity and completeness of state-of-the-art generative shape models -- we use Trellis.2 as an example -- which we generalize to the scene level. To this end, we propose a projection-based conditioning mechanism that lifts posed multi-view image features into a coherent 3D representation aligned with the generative model, independent of view ordering and spatially anchored to the scene, yielding high-fidelity, multi-view consistent generated geometry. This enables lifting the strong object-level prior of Trellis.2 to multi-view, scene-scale generation, producing faithful, editable PBR mesh reconstructions of indoor environments. As a result, we obtain high-fidelity results that outperform cutting-edge reconstruction methods by 16%.