RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution

2026-05-01Machine Learning

Machine LearningComputation and LanguageMultiagent Systems
AI summary

The authors developed RunAgent, a system that helps large language models (LLMs) follow step-by-step plans more reliably. RunAgent uses special control instructions like IF and GOTO to manage plan steps and checks each step for correctness using rules based on the task description. It can decide when to think, use tools, or write code to complete tasks and also fixes errors along the way. Tests show RunAgent does better than other similar methods in understanding and executing plans.

large language modelsplan executionnatural language processingcontrol constructsconstraint validationtool usagecode generationerror correctionworkflow automationmulti-agent systems
Authors
Arunabh Srivastava, Mohammad A., Khojastepour, Srimat Chakradhar, Sennur Ulukus
Abstract
Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics. RunAgent bridges the expressiveness of natural language with the determinism of programming via an agentic language with explicit control constructs (e.g., \texttt{IF}, \texttt{GOTO}, \texttt{FORALL}). Beyond verifying syntactic and semantic verification of the step output, which is performed based on the specific instruction of each step, RunAgent autonomously derives and validates constraints based on the description of the task and its instance at each step. RunAgent also dynamically selects among LLM-based reasoning, tool usage, and code generation and execution (e.g., in Python), and incorporates error correction mechanisms to ensure correctness. Finally, RunAgent filters the context history by retaining only relevant information during the execution of each step. Evaluations on Natural-plan and SciBench Datasets demonstrate that RunAgent outperforms baseline LLMs and state-of-the-art PlanGEN methods.