A Stackelberg Framework for Resource-Aware LLM Agents: Learning, Repair, and Conditional Guarantees
2026-06-22 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors study how to manage the limited resources of large language models (LLMs) that have to handle many back-and-forth interactions while balancing quality and cost. They model this as a game where a controller sets goals and costs, and the LLM executor decides how much resource to use for context, prompts, and tools. They create and improve a policy based on learning from a simulated environment and then test it with real API calls, finding that their approach reduces token usage by about 17% without hurting quality. Their theoretical results support the method but come with conditions, and their experiments show promise rather than a guaranteed perfect solution.
large language modelresource allocationStackelberg gamecontext managementprompt designtool usagepolicy learningreal-API calibrationequilibriumtoken cost
Authors
Baoxun Wang
Abstract
Large language model (LLM) agents increasingly operate as multi-turn systems that must allocate context, prompt verbosity, and tool access under finite computational budgets. Static thresholds are simple, but they are brittle under heterogeneous tasks and evolving session states. We formulate resource governance as a contextual Stackelberg game: a controller commits to a quality target and a cost incentive, while an executor responds with resource actions over context, prompting, and tool usage. We learn a conditional response model, optimize a leader policy against that model, and repair the resulting policy using real-API calibration and projection onto an empirically selected action set. For the restricted game, we establish conditional guarantees for equilibrium existence, follower-response stability, safe-set projection, and transfer from a surrogate environment to the real environment under bounded value error. The primary real-API experiment comprises 300 evaluated turns. Relative to a conservative baseline, the selected repaired controller reduces mean token cost by 17.4% (Welch $p=0.022$), while the measured quality difference is not statistically significant ($p=0.44$). The theoretical results are conditional and the experiments do not estimate their regret or transfer constants; consequently, the evidence establishes a promising repaired operating point, not a certified real-system equilibrium.