EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video
2026-06-15 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial IntelligenceRobotics
AI summaryⓘ
The authors created EgoPhys, a system that learns how soft, bendable objects like fabrics move by watching videos taken from a person’s point of view. Instead of needing to test every little part each time, EgoPhys uses a smart codebook to quickly predict how new objects will behave. It was trained on a variety of videos showing different objects and ways of touching them, and it works better than other methods at guessing how these objects will move and change shape. They also tested it on a real robot, showing that it can help the robot plan how to handle soft objects based on just one video.
deformable objectsdigital twinegocentric videoinverse physicsspring stiffnessroboticssimulationgeneralizable priorsxArm6 robotphysical modeling
Authors
Hyunjin Kim, Ri-Zhao Qiu, Guangqi Jiang, Xiaolong Wang
Abstract
Humans naturally understand object physics through everyday interactions, but faithfully predicting complex deformable dynamics, such as elastic materials and fabrics, remains a major challenge for computer vision and robotics. We present EgoPhys, a framework that constructs deformable physical digital twins from egocentric RGB-only video using generalizable priors. EgoPhys overcomes the limitations of existing methods to enable controllable deformable digital twin generation from egocentric videos by distilling per-object inverse-physics solutions into a compact codebook, enabling prediction of dense spring stiffness fields for unseen objects without per-spring test-time optimization. Trained with generalizable priors from diverse egocentric interactions, EgoPhys outperforms baselines in reconstruction, future prediction, and zero-shot generalization. To support training and evaluation, we curate an egocentric interaction dataset covering diverse deformable objects, scenes, and manipulation styles. We deploy EgoPhys on a real xArm6 robot, demonstrating that a digital twin initialized from a single egocentric human play video can serve as an internal world representation to aid in deformable-object planning, highlighting egocentric RGB observations as a scalable path toward real-to-sim pipelines.