Learning Efficient 4D Gaussian Representations from Monocular Videos with Flow Splatting

2026-06-29 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors tackle the difficulty of creating detailed 3D videos from just one camera that shows moving scenes. They improve on past methods by introducing 'Flow Splatting,' which uses a velocity field to better understand and display motion over time. Instead of slow and memory-heavy approaches, their technique renders motion and appearance efficiently using a new way to represent and process dynamic 3D information. Their tests show that this method is faster and produces clearer images than previous techniques.

3D Gaussian Splattingmonocular videodynamic scene reconstructionvelocity fieldoptical flowvolume rendering4D volumesspatiotemporal modelingrendering speedtemporal dynamics

Authors

Shengjun Zhang, Jinzhao Li, Xin Fei, Yueqi Duan

Abstract

Reconstructing dynamic 3D scenes from monocular videos is challenging due to scene complexity and temporal dynamics. With the advancement of 3D Gaussian Splatting in novel view synthesis, existing methods extend 3D Gaussians to 4D domain with deformation fields, trajectories or spatiotemporal 4D volumes to model scene element deformation. However, these methods suffer from long training time, low rendering speed or high memory consumption for per-frame reconstruction of 4D volumes, without fully exploiting dense dynamic information. To address this issue, we propose Flow Splatting, which constructs the velocity field and enables the conventional splatting technique to render optical flow from the velocity field to supervise dynamics learning process from monocular videos. Specifically, we extend 4D volumes with time varying means and covariance to represent complex dynamics. Then, we construct and approximate the velocity field naturally based on this representations. While conventional volume rendering techniques support to render color fields, we extend the volume rendering strategy to splat the velocity field by considering the influence of camera motions. We conduct experiments on various benchmarks to demonstrate the efficiency and effectiveness of our method. Compared to the state-of-the-art methods, our model achieves better image quality with less time consumption and higher rendering speed.

View PDFOpen arXiv