Capability-Aligned Hierarchical Learning for Tool-Augmented LLMs

2026-06-08Artificial Intelligence

Artificial Intelligence
AI summary

The authors explore how large language models (LLMs) can better use external tools by improving the way they plan and execute tasks. They point out that previous methods optimized planning and tool use separately, which caused misalignment and less effective performance. To fix this, the authors propose a new method called Capability-Aligned Hierarchical Learning (CAHL), which trains both planning and execution parts together using reinforcement learning. Their tests show that CAHL helps the system work more smoothly and effectively on various tool-using tasks.

Large Language ModelsTool LearningHierarchical PolicyHigh-level PlannerLow-level ExecutorReinforcement LearningTask DecompositionAPI-BankBFCLBamboogle
Authors
Haotong Yang, Ting Long, Yi Chang
Abstract
Tool learning enables LLMs to invoke external tools to accomplish tasks. Prior studies have demonstrated the effectiveness of a hierarchical structure: a high-level policy handles global planning and decomposes tasks into manageable sub-tasks, and a low-level policy focuses on invoking tools to solve these sub-tasks. However, these works typically optimize the high-level and low-level policies separately, leading to planner-executor misalignment and limiting LLM performance on tool-use tasks. In this paper, we propose a method called Capability-Aligned Hierarchical Learning (CAHL), which leverages RLVR to jointly optimize both policies, enabling better alignment between the high-level planner and the low-level executor. Experiments on constrained tool-use benchmarks (API-Bank and BFCL) and an open-ended environment (Bamboogle) demonstrate the effectiveness of CAHL.