SimFoundry: Modular and Automated Scene Generation for Policy Learning and Evaluation

2026-06-26Robotics

Robotics
AI summary

The authors created SimFoundry, a system that turns videos of real-world scenes into digital copies for robot training, without needing real-world trials. This system can also make variations of these scenes, called digital cousins, to help robots learn to handle changes in objects and tasks. Robots trained this way perform well on real tasks right away, especially in handling complex manipulations with both hands. The authors found strong links between simulation and real-world success, and using digital cousins in training improved robot task success by up to 40%.

robot policiesreal-to-simdigital twinzero-shot transfermanipulation tasksaffordancesimulationmulti-step manipulationbimanual interactionpolicy training
Authors
Nadun Ranawaka, Josiah Wong, Wei-Lin Pai, Wei-Teng Chu, Tianyuan Dai, Masoud Moghani, Hang Yin, Yunfan Jiang, Wesley Durbano, Brandon Huynh, Yu Fang, Linxi Fan, Danfei Xu, Ruohan Zhang, Li Fei-Fei, Bowen Wen, Ajay Mandlekar, Yuke Zhu
Abstract
Training and evaluating robot policies in the real world is costly and difficult to scale. We introduce SimFoundry, a modular and automated system for zero-shot real-to-sim scene construction from a video. SimFoundry generates sim-ready digital twins and supports object, scene, and task editing, enabling the automated generation of diverse digital cousins: affordance-preserving variations of reconstructed real-world scenes. Policies trained on SimFoundry data transfer zero-shot to challenging real tasks involving multi-step manipulation, articulated object interaction, and bimanual interaction, and its digital cousins (variations of the original scene, objects, and tasks) facilitate generalization to new real-world conditions. Across 7 manipulation tasks and 5 policy architectures, SimFoundry simulation evaluations strongly predict real-world performance, with mean Pearson correlation 0.911 and mean maximum ranking violation 0.018. When evaluating sim-trained policies zero-shot in the real world, policies trained with object, scene, and task cousins in simulation show average task success rate improvements of 17%, 21%, and 40%, respectively. Additional details at https://research.nvidia.com/labs/gear/simfoundry/ .