Robust Asynchronous Planning via Auto-Formalization

2026-05-31 • Computation and Language

Computation and Language

AI summaryⓘ

The authors study how large language models (LLMs) plan tasks that involve actions happening at different times and with constraints. They compare two ways LLMs plan: directly creating action lists (Planners) and translating tasks for external solvers (Formalizers). They find that as tasks get bigger, the planner and one type of formalizer (PDDL2.1) struggle a lot, but another formalizer (CP-SAT) remains much more accurate. They also show that updating plans during execution makes performance worse, but a method that carefully updates only necessary parts helps the CP-SAT formalizer recover accuracy.

Large Language ModelsAsynchronous PlanningFormalizerPlannerPDDL2.1CP-SATConstraint SatisfactionTask PlanningExecution-time Constraints

Authors

Jiayi Zhang, Jianing Yin, Ben Zhou, Li Zhang

Abstract

LLMs can plan by either generating action sequences directly as a Planner or translating tasks into domain specific language for an external solver as a Formalizer. While most real-world tasks are asynchronous with non-uniform durations, concurrency, and execution-time constraints, existing benchmarks hardly cover them. We unify these asynchronous planning challenges under a single formulation and introduce the first three benchmarks that address each at scale. We conclude that the choice of formal representation primarily determines whether planning scales: as dependency graphs grow from 5 to 100 actions, Planner collapses from 96% to 5% plan accuracy and PDDL2.1 Formalizer from 13% to 0%, while CP-SAT Formalizer averages 94% and still achieves 83% at 100 actions. Faithfulness diagnostics show that PDDL2.1's predicate-based planning representation becomes brittle compared to general constraint satisfaction programs, when LLMs must keep predicates, effects, and goals consistent. Execution-time updates of planning constraints further degrade performance sharply (Planner 23.9%, PDDL2.1 0.7%, CP-SAT 46.1%), but a state-aware repair strategy that updates only event-induced constraints recovers CP-SAT Formalizer to 84.5%.

View PDFOpen arXiv