The Token Not Taken: Sampling, State, and the Variability of AI Agent Outputs

2026-06-08Artificial Intelligence

Artificial IntelligenceComputers and Society
AI summary

The authors explain why AI agents can give different answers even when asked the same thing multiple times. They show that randomness in how the AI picks words (tokens) during its thinking process causes some of this variability. Other factors like changing environments and system details also affect the outcomes. By clearly separating these causes, the authors help us understand when AI behavior is truly random and when it might still change even if it runs deterministically.

agentic AIfoundation modeltoken generationstochasticitydeterministic executiontool callsstate updatessamplingorchestration looppseudo-random number generator
Authors
Muhammad Zia Hydari, Raja Iqbal
Abstract
Agentic AI systems can behave differently across runs: the same request may produce a different plan, a different tool call, a different code edit, or a different final answer. Such variability arises from several layers that are often conflated. A foundation model is a large pretrained model, usually adaptable to many downstream tasks, that maps an input context to predictions over outputs. In many current agents, that model is embedded in an orchestration loop that plans, calls tools, observes results, and updates state. One explicit intrinsic source of variability in such systems is token generation: the model computes scores over possible next tokens, the scores are converted into probabilities, and a decoder may sample tokens using a pseudo-random number generator. A small sampled token difference can then propagate upward into a different tool call, code path, search query, or agent state. Other sources of variability are extrinsic to token sampling, including changing environments, live data, serving infrastructure, batch effects, and numerical details. By separating these layers, the manuscript clarifies what it means to call agentic AI systems stochastic, when such variability can be reproduced under matched conditions, and why deterministic execution need not imply identical behavior in deployed settings.