Policy-as-Data: Learning Generalizable HOI Diffusion Models from Simulated Physics

2026-06-22Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors developed a new way to create realistic human interactions with objects by using a physics simulator instead of relying only on expensive real-world motion capture data. They train policies using reinforcement learning within the simulator to generate lots of varied interaction data, which helps their model learn better. They also connect the simple simulated models to more detailed body models through a special retargeting process. Their approach improves how well the model works with new objects and during longer actions, while keeping the interactions more physically accurate and diverse.

Human-Object InteractionPhysics SimulatorReinforcement LearningGenerative ModelMotion CaptureTask-Oriented Data GenerationRetargetingParametric Body ModelsPhysical ConsistencyLong-Horizon Generation
Authors
Shujia Li, Jianshu Hu, Haiyu Zhang, Yunpeng Jiang, Haoyuan Jin, Xinyuan Chen, Yaohui Wang, Yutong Ban
Abstract
Synthesizing realistic Human-Object Interactions (HOI) is critical for creating embodied avatars and functional virtual environments. However, current data-driven approaches primarily rely on motion capture datasets, which are expensive to scale and limited in functional diversity. Models trained with these datasets fail to generalize to unseen objects and maintain physical consistency over long horizons. In this paper, we propose a novel framework that leverages a physics simulator to overcome the data-scarcity bottleneck in HOI generation. Specifically, we propose a scalable pipeline, called \ours, which leverages policies trained with reinforcement learning in a physics simulator for task-oriented data generation and trains a generative model on the augmented dataset for generalizable HOI generation. To seamlessly utilize the synthetic data, we introduce a coarse-to-fine retargeting process that bridges the representation gap between the simplified model used in physics simulator and the standard parametric body models required for generative training. Validated through comprehensive experiments, our method demonstrates enhanced generalization to unseen objects and the capability of long-horizon generation, while exhibiting greater dynamic diversity and physical plausibility.