Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

2026-05-25Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors address the challenge of creating complete digital twins (3D models) from videos, which is hard with normal videos due to limited camera views causing inconsistencies. They propose using 360° videos instead, which cover the whole scene and help keep visuals consistent. Their system, Pantheon360, builds a 3D structure from sparse 360° video input as a guide, letting a model focus on making textures look realistic while maintaining accurate geometry. Tests show that this approach produces better-looking and more consistent 360° videos for use in simulations and digital twin creation.

Digital Twin360° VideoVideo Diffusion3D CacheCamera PathSpatial-Temporal ConsistencyPanoramic CoveragePhotorealistic TextureGeometric Coherence
Authors
Ting-Hsuan Chen, Ying-Huan Chen, Tao Tu, Jie-Ying Lee, Cho-Ying Wu, Fangzhou Lin, Hengyuan Zhang, David Paz, Xinyu Huang, Yuliang Guo, Yu-Lun Liu, Yue Wang, Liu Ren
Abstract
Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.