TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos

2026-06-01Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors developed a new method called TROPHIES to create accurate 4D models of humans, their surroundings, and camera positions all together from multiple video views. Unlike previous methods that looked at these elements separately or from just one view, TROPHIES links everything into a single, consistent space for better motion and shape accuracy. They used special parts to focus on humans and scenes and combined them using rules that keep scale and contact realistic. Tests showed that TROPHIES produces more reliable and physically correct reconstructions compared to other approaches.

4D reconstructionmulti-view videoshuman-scene interactioncamera pose estimationtemporal coherencespatial reasoninggeometry reconstructionglobal alignment
Authors
Jinpeng Liu, Yukang Xu, Yutong Li, Xingyu Liu
Abstract
Reconstructing humans and their surrounding environments in a globally consistent 4D space is essential for comprehensive perception. However, prior works typically assume single-view inputs or decouple humans, scenes, and cameras, making them unable to recover coherent geometry, stable motion, and physically aligned trajectories. These limitations motivate us to introduce a new task: unified human-scene-camera reconstruction from multi-view videos, which aims to jointly estimate dynamic humans, static scenes, and camera poses in one global coordinate frame. We propose TROPHIES--Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos-a unified framework tailored for this task. TROPHIES features a Human Branch that models humans through temporal and spatial reasoning, and a Scene Branch that reconstructs static geometry with human-aware attention. A global alignment and optimization module couples both branches by enforcing scale consistency, contact priors, and cross-view temporal coherence. Experiments on EgoHuman and EgoExo4D demonstrate that TROPHIES achieves globally aligned, physically plausible 4D reconstructions and consistently outperforms existing paradigms in both global fidelity and human-scene consistency.