WebSpline: Structure-Informed Splines for Real-Time 3D Gaussians from Monocular Videos

2026-06-01Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors present WebSpline, a new method to reconstruct moving 3D scenes from single-camera videos. Their approach uses a special mathematical tool called Structure-Informed Splines (SIS) combined with a Structural Proxy Graph (SPG) to keep the scene's shape consistent while capturing detailed motion. They first create and refine the SPG based on tracked 2D points over time, then optimize the splines to represent object movements smoothly. This method allows fast and high-quality 3D rendering, outperforming previous techniques on standard datasets. WebSpline runs much faster while maintaining better visual fidelity compared to earlier work.

dynamic scene reconstructionmonocular videos3D Gaussian frameworkStructure-Informed Spline (SIS)cubic Hermite splineStructural Proxy Graph (SPG)temporal rigidity regularizationspatial neighborhood constraintsfast renderingmotion modeling
Authors
Jongmin Park, Jeonghwan Yun, Minh-Quan Viet Bui, Munchurl Kim
Abstract
Dynamic scene reconstruction from monocular videos remains highly challenging, as existing methods often struggle to balance global structural coherence and local fine-grained details under limited multi-view cues. To address this challenge, we propose WebSpline, a novel dynamic 3D Gaussian framework that enables structurally coherent and high-fidelity reconstruction from monocular videos with fast rendering. The core of WebSpline is the Structure-Informed Spline (SIS) representation, which models each dynamic Gaussian trajectory using a learnable cubic Hermite spline whose motion is structurally organized with an auxiliary Structural Proxy Graph (SPG). The proposed framework is optimized in two stages: (i) in the first stage, the SPG is initialized from 2D point tracks and refined with temporal rigidity regularization to establish structural coherence for moving objects across the sequence; and (ii) in the second stage, the SIS representation is initialized from the refined SPG and optimized under both spatial and structural neighborhood constraints. At inference, Gaussian motion is obtained solely by evaluating the learned SIS, enabling fast rendering. Extensive experiments on the challenging monocular dynamic scene benchmarks, iPhone and NVIDIA, demonstrate that our WebSpline achieves state-of-the-art rendering quality while rendering over 10 times faster than WorldTree, the second-best method on the iPhone dataset.