VisDom: Sparse Novel View Synthesis with Visible Domain Constraint

2026-06-18Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors address the problem of creating 3D views of objects from very few pictures, which is difficult because it’s hard to guess the shape correctly. They introduce VisDom, a method that uses a rule saying parts of the object must be seen by multiple pictures to filter out wrong guesses, improving on older silhouette methods. VisDom works without any training and can be added to existing techniques like NeRF and Gaussian Splatting to get better shapes from only four images. Their tests show that VisDom helps produce more accurate and consistent 3D reconstructions at a much lower training cost.

Novel View SynthesisNeRFGaussian SplattingVisual HullSilhouette ConsistencyMulti-view Visibility3D ReconstructionSparse ViewsVolumetric SamplingObject-centric Modeling
Authors
Mariia Gladkova*, Tarun Yenamandra*, Edmond Boyer, Robert Maier, Tony Tung, Daniel Cremers
Abstract
Sparse novel view synthesis (NVS) remains challenging due to the ambiguity of recovering 3D geometry from few input views. While NeRF- and Gaussian Splatting (GS)-based methods perform well with dense supervision, they often overfit in sparse settings, producing floating artifacts and inconsistent geometry. Silhouette consistency is commonly used as a regularizer, but it remains insufficient, as silhouette-consistent regions can extend beyond the true object geometry. We introduce VisDom, a learning-free geometric constraint that augments classical carving-based visual hull reconstruction by enforcing a minimum multi-view visibility requirement. Specifically, we define a visible domain as the subset of 3D space observed by at least $K$ views and use it as an additional filtering criterion on top of standard silhouette-based reconstruction. This provides a stronger spatial prior in sparse-view settings. We integrate VisDom into both implicit (NeRF) and explicit (GS) pipelines by restricting volumetric sampling and guiding Gaussian placement during optimization. Experiments on three challenging datasets show consistent improvements in sparse-view NVS, enabling high-quality object-centric reconstruction from as few as four input images. Our method is domain-agnostic, requires only silhouettes, and introduces no learned parameters, making it a simple complement to existing approaches. Applying VisDom on top of GaussianObject further improves performance on Omni3D and MipNeRF360, while matching or surpassing it at 22 $\times$ lower training cost.