Any House Any Task: Scalable Long-Horizon Planning for Abstract Human Tasks

2026-02-12 • Robotics

Robotics

AI summaryⓘ

The authors developed AHAT, a robot planner designed to handle long and complex household tasks based on short, unclear instructions. AHAT uses a specialized language model to translate these instructions and descriptions of the environment into small, clear goals that a robot can understand and act on. They also created a new learning method called TGPO to help AHAT better break down tricky instructions by correcting its reasoning steps during training. Tests show AHAT performs better than previous methods for planning detailed tasks in large home settings.

Large Language ModelsTask PlanningPlanning Domain Definition LanguageReinforcement LearningSymbolic ReasoningLong-horizon PlanningTextual Scene GraphsGroup Relative Policy OptimizationHousehold RoboticsInstruction Ambiguity

Authors

Zhihong Liu, Yang Li, Rengming Huang, Cewu Lu, Panpan Cai

Abstract

Open world language conditioned task planning is crucial for robots operating in large-scale household environments. While many recent works attempt to address this problem using Large Language Models (LLMs) via prompting or training, a key challenge remains scalability. Performance often degrades rapidly with increasing environment size, plan length, instruction ambiguity, and constraint complexity. In this work, we propose Any House Any Task (AHAT), a household task planner optimized for long-horizon planning in large environments given ambiguous human instructions. At its core, AHAT utilizes an LLM trained to map task instructions and textual scene graphs into grounded subgoals defined in the Planning Domain Definition Language (PDDL). These subgoals are subsequently solved to generate feasible and optimal long-horizon plans through explicit symbolic reasoning. To enhance the model's ability to decompose complex and ambiguous intentions, we introduce TGPO, a novel reinforcement learning algorithm that integrates external correction of intermediate reasoning traces into Group Relative Policy Optimization (GRPO). Experiments demonstrate that AHAT achieves significant performance gains over state-of-the-art prompting, planning, and learning methods, particularly in human-style household tasks characterized by brief instructions but requiring complex execution plans.

View PDFOpen arXiv