PhysInOne: Visual Physics Learning and Reasoning in One Suite

2026-04-10 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial IntelligenceMachine LearningRobotics

AI summaryⓘ

The authors created PhysInOne, a very large set of computer-made videos showing physical events, much bigger than previous collections. Their dataset includes many 3D scenes with different objects and covers basic physics ideas like motion, light, fluids, and magnetism. Each video comes with detailed information about the objects and their physical properties. They tested PhysInOne in tasks like predicting future video frames and estimating physical properties, finding it helps models understand physics better but also reveals challenges. This dataset sets a new standard for training AI to learn about the physical world.

synthetic dataset3D scenesphysical phenomenamechanicsopticsfluid dynamicsmagnetismvideo predictionphysical property estimationfoundation models

Authors

Siyuan Zhou, Hejun Wang, Hu Cheng, Jinxi Li, Dongsheng Wang, Junwei Jiang, Yixiao Jin, Jiayue Huang, Shiwei Mao, Shangjia Liu, Yafei Yang, Hongkang Song, Shenxing Wei, Zihui Zhang, Peng Huang, Shijie Liu, Zhengli Hao, Hao Li, Yitian Li, Wenqi Zhou, Zhihan Zhao, Zongqi He, Hongtao Wen, Shouwang Huang, Peng Yun, Bowen Cheng, Pok Kazaf Fu, Wai Kit Lai, Jiahao Chen, Kaiyuan Wang, Zhixuan Sun, Ziqi Li, Haochen Hu, Di Zhang, Chun Ho Yuen, Bing Wang, Zhihua Wang, Chuhang Zou, Bo Yang

Abstract

We present PhysInOne, a large-scale synthetic dataset addressing the critical scarcity of physically-grounded training data for AI systems. Unlike existing datasets limited to merely hundreds or thousands of examples, PhysInOne provides 2 million videos across 153,810 dynamic 3D scenes, covering 71 basic physical phenomena in mechanics, optics, fluid dynamics, and magnetism. Distinct from previous works, our scenes feature multiobject interactions against complex backgrounds, with comprehensive ground-truth annotations including 3D geometry, semantics, dynamic motion, physical properties, and text descriptions. We demonstrate PhysInOne's efficacy across four emerging applications: physics-aware video generation, long-/short-term future frame prediction, physical property estimation, and motion transfer. Experiments show that fine-tuning foundation models on PhysInOne significantly enhances physical plausibility, while also exposing critical gaps in modeling complex physical dynamics and estimating intrinsic properties. As the largest dataset of its kind, orders of magnitude beyond prior works, PhysInOne establishes a new benchmark for advancing physics-grounded world models in generation, simulation, and embodied AI.

View PDFOpen arXiv