PerceptTwin: Semantic Scene Reconstruction for Iterative LLM Planning and Verification
2026-06-02 • Robotics
RoboticsArtificial Intelligence
AI summaryⓘ
The authors created PerceptTwin, a tool that automatically builds interactive simulations using what a robot sees in its environment. These simulations help check and improve the robot's plans before the robot actually tries them out. PerceptTwin also uses a language model to judge if plans make sense and follow human preferences. Their tests showed that using PerceptTwin improved the success of robot planning by about 39% and helped humans better spot plan mistakes by 18%. This work suggests that using robot perception to make simulations can make robot planning safer and more reliable.
robot policy learningsimulation environmentsemantic scene representation3D asset generationaffordance predictionLLM (large language model)plan verificationAI alignmentrobot planning
Authors
Charlie Gauthier, Sacha Morin, Liam Paull
Abstract
Simulation environments are useful for both robot policy learning and planning verification and validation. Traditionally, the process of creating a simulation was onerous. Creating a bespoke simulation environment for each individual environment that a robot would operate in was simply infeasible. In this work, we introduce PerceptTwin, a fully automatic pipeline that constructs interactive simulations directly from semantic scene representations produced by a robot's perception stack. PerceptTwin combines open-vocabulary object maps with 3D asset generation, affordance prediction, and commonsense condition checking. These interactive simulations can be used to validate and refine plans before they are executed on the robot hardware. Borrowing from the AI alignment literature, we also introduce an LLM judge that verifies plan correctness and alignment with human preferences. Experiments show that PerceptTwin feedback allows LLM planners to refine plans, enhance safety, and resist harmful black-box prompting attacks. In our suite of tasks, PerceptTwin improves plan success by an average of approximately 39% for GPT5, GPT5Mini, and GPT5Nano planners. Additionally, PerceptTwin also improves human plan verification by up to 18% on average for plans that fail due to unfilled skill preconditions. Our results demonstrate the potential of open-vocabulary scene simulation from robot perception as a foundation for safer, more reliable robot planning.