SyncFix: Fixing 3D Reconstructions via Multi-View Synchronization

2026-04-13Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors created SyncFix, a method that improves 3D scene reconstructions by making sure different views match up well during a refining process. SyncFix works by linking noisy and clean versions of the scene across multiple angles at the same time, helping fix errors in shape and meaning. They trained it using just pairs of images but it can handle many views when used. Tests show SyncFix makes better reconstructions than existing methods, even without perfect reference images, and it performs even better when some clean references are given.

diffusion modelsscene reconstructioncross-view consistencylatent spacedenoising3D reconstructionsemantic consistencygeometric consistencymulti-view learningimage pairs
Authors
Deming Li, Abhay Yadav, Cheng Peng, Rama Chellappa, Anand Bhattad
Abstract
We present SyncFix, a framework that enforces cross-view consistency during the diffusion-based refinement of reconstructed scenes. SyncFix formulates refinement as a joint latent bridge matching problem, synchronizing distorted and clean representations across multiple views to fix the semantic and geometric inconsistencies. This means SyncFix learns a joint conditional over multiple views to enforce consistency throughout the denoising trajectory. Our training is done only on image pairs, but it generalizes naturally to an arbitrary number of views during inference. Moreover, reconstruction quality improves with additional views, with diminishing returns at higher view counts. Qualitative and quantitative results demonstrate that SyncFix consistently generates high-quality reconstructions and surpasses current state-of-the-art baselines, even in the absence of clean reference images. SyncFix achieves even higher fidelity when sparse references are available.