LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

2026-06-18Artificial Intelligence

Artificial IntelligenceComputation and Language
AI summary

The authors highlight that customer-service agents using tools often struggle because they don't keep track of important information separately, causing mistakes or policy breaks. They propose a method called LedgerAgent, which keeps a clear record (a ledger) of the task’s important details and checks rules before making any tool calls. This helps the agent avoid errors and follow policies better. Testing across different service areas showed that LedgerAgent performs more reliably than usual approaches, especially when consistency over multiple tries is important.

tool-calling agentstask statecustomer serviceprompt-based agentspolicy adherencestate managementinference-time methodLedgerAgentconsistency metricsenvironment-changing tool calls
Authors
Md Nayem Uddin, Amir Saeidi, Eduardo Blanco, Chitta Baral
Abstract
Policy-adherent tool-calling agents in customer-service domains must maintain task states across turns while calling tools and obeying domain policies. Task states consist of relevant facts, identifiers, constraints, and conditions observed through user interaction and tool calls. In standard agents, task states are not represented separately. Observations, tool returns, and policy instructions are placed in the prompt, leaving agents to reconstruct the relevant states from the prompt each time they decide what to do next. This design makes state management implicit, creating two common failure modes. An agent may retrieve the right facts but later ground its decision in stale, missing, or incorrect information; and a syntactically valid tool call may still violate a domain policy that depends on the current task state. We introduce \textsc{LedgerAgent}, an inference-time method for tool-calling agents that maintains observed task states in a separate ledger and renders the states into the prompt. The ledger is also used to check state-dependent policy constraints before environment-changing tool calls are executed, blocking policy violations. Across four customer-service domains and a mixed panel of open- and closed-weight models, \textsc{LedgerAgent} improves average pass\textasciicircum{}k over a standard prompt-based tool-calling approach, with the largest gains under stricter multi-trial consistency metrics.