RePlan-Bot: Multi-Level Replanning for Embodied Instruction Following

2026-05-25Robotics

Robotics
AI summary

The authors created RePlan-Bot, a robot that follows complex spoken instructions in 3D virtual spaces. It plans its actions continuously, adjusting goals based on what it senses around it. It uses a language model to check and update plans, a map-based system to find objects, and a vision tool to avoid mistakes. Tests show it works better than previous robots on a common task set, especially in new places it hasn't seen before.

Embodied Instruction FollowingLong-Horizon PlanningLarge Language ModelsSub-goal AdjustmentInstance MapVision TransformerALFRED Benchmark3D Interactive EnvironmentsTask Replanning
Authors
Xicheng Gong, Guozheng Sun, Peiran Xu, Yadong Mu
Abstract
Embodied instruction following (EIF) requires agents to understand and execute complex natural language commands within interactive 3D environments. Despite recent advances, existing methods often fail in long-horizon planning and handling irreversible state changes, resulting in low task success rates. To address these challenges, we introduce RePlan-Bot, a novel EIF agent that performs multi-level, continuous replanning throughout task execution. RePlan-Bot integrates a high-level LLM-based auditor for dynamic sub-goal adjustments guided by environmental feedback, a commonsense-guided search mechanism based on a multi-layered instance map for precise and structured object localization, and a lightweight ViT-based corrector to preemptively fix risky low-level actions. Evaluated on the ALFRED benchmark, RePlan-Bot achieves state-of-the-art performance in both seen and unseen environments, demonstrating superior adaptability and reliability.