SimuScene: Simulation-Ready Compositional 3D Scene Reconstruction from a Single Image
2026-06-02 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionRobotics
AI summaryⓘ
The authors developed SimuScene, a method for turning a single picture into a 3D scene where objects behave realistically when simulated. Instead of fixing object positions after the fact, they use a physics engine during the creation process to spot and correct problems like objects sinking or overlapping. This method adjusts shapes and positions so everything stays stable under gravity. Their tests show improved stability and accuracy, and they demonstrate how these scenes help with robot control tasks.
3D reconstructionsingle-image liftingphysics enginesimulation-ready scenesrobotic manipulationshape estimationgravity simulationcompositional scenegeometric alignmentphysical stability
Authors
Inhee Lee, Sangwon Baik, Sungjoo Kim, Hyeonwoo Kim, Hyunsoo Cha, Hanbyul Joo
Abstract
Reconstructing interactive, simulation-ready 3D scenes from a single image is a critical bottleneck for robotic manipulation. While recent single-image lifters recover plausible per-object shapes, composing them yields scenes that collapse under physical simulation due to interpenetrating, hovering, or sinking objects. Existing physics-aware methods address this strictly as a post-hoc layout correction, leaving the underlying geometric errors unresolved. To address this, we introduce SimuScene, a compositional 3D reconstruction pipeline that puts physics in the loop of shape and layout estimation. Rather than using physics merely for layout cleanup, we utilize the physics engine as a diagnostic measurement tool during the generative process itself. By diagnostically simulating reconstructed objects under gravity, we convert penetration and support failures into quantitative correction signals that drive gravity-axis stretching and amodal shape resampling. This physics-informed feedback loop mitigates accumulated reconstruction errors and produces a stable, simulation-ready compositional 3D scene. Extensive experiments demonstrate state-of-the-art performance on physical stability and geometric alignment benchmarks. We further highlight SimuScene's utility by deploying reconstructed environments in humanoid control and robot-arm manipulation tasks.