Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents

2026-04-09Artificial Intelligence

Artificial IntelligenceComputation and LanguageMultiagent Systems
AI summary

The authors introduce TrACE, a method that lets large language models (LLMs) decide how much thinking power to use at each step based on how confident they are. Instead of giving the same amount of computing time for every decision, TrACE checks if multiple guesses agree on the same answer. If they do, it moves on quickly; if not, it spends more time exploring before choosing. Their tests show TrACE saves a lot of computing effort while keeping accuracy similar to other fixed methods. This approach requires no extra training or human help and works well on both simple and complex tasks.

large language modelsinference-time compute scalingself-consistencyadaptive computemulti-step decision tasksagreement measurementgreedy decodingrolloutstraining-free methodsstep-level success
Authors
Khushal Sethi
Abstract
Inference-time compute scaling has emerged as a powerful technique for improving the reliability of large language model (LLM) agents, but existing methods apply compute uniformly: every decision step receives the same budget regardless of its difficulty. We introduce TrACE (Trajectorical Adaptive Compute via agrEement), a training-free controller that allocates LLM calls adaptively across agent timesteps by measuring inter-rollout action agreement. At each step, TrACE samples a small set of candidate next actions and measures how consistently the model commits to the same action. High agreement signals an easy decision; the controller commits immediately. Low agreement signals uncertainty; the controller samples additional rollouts up to a configurable cap before committing to the plurality action. No learned components, no external verifier, and no human labels are required. We evaluate TrACE against greedy decoding and fixed-budget self-consistency (SC-4, SC-8) on two benchmarks spanning single-step reasoning (GSM8K, n=50) and multi-step household navigation (MiniHouse, n=30), using a Qwen 2.5 3B Instruct model running on CPU. TrACE-4 matches SC-4 accuracy while using 33% fewer LLM calls on GSM8K and 39% fewer on MiniHouse. TrACE-8 matches SC-8 accuracy with 55% fewer calls on GSM8K and 65% fewer on MiniHouse. We further show that inter-rollout agreement is a reliable signal of step-level success, validating the core hypothesis that the model's own output consistency encodes difficulty information that can be exploited without training. TrACE is the first training-free, per-timestep adaptive-compute controller for LLM agents to be evaluated on multi-step sequential decision tasks.