Should LLM Agents Decide in Social Simulations? Comparing Finite-State and LLM-Based Decision Policies

2026-06-10 • Computers and Society

Computers and Society

AI summaryⓘ

The authors tested how well large language models (LLMs) can follow a set behavior rule in social network simulations. They compared LLM decisions to a clear, simple policy based on a Markov model using 1,000 fake users and many actions. They found that while some LLM setups came close, the models often changed the expected behavior and were much slower than the basic method. Adding extra instructions sometimes made the LLMs less accurate by biasing their choices. Overall, the authors show that using LLMs for making decisions in these simulations doesn't reliably replace explicit, straightforward decision rules.

large language modelssocial simulationonline social networkMarkov modelfinite state machineaction selectionJensen-Shannon Divergenceprompting strategiesmodel alignmentcomputational cost

Authors

Alejandro Buitrago López, Javier Pastor-Galindo, José A. Ruipérez-Valiente

Abstract

Large language models (LLMs) are increasingly used as decision-making components in social simulations. This introduces a methodological risk: the simulation may deviate from the explicit behavioral policy defined by the researcher. In online social network (OSN) simulations, action choices shape system dynamics, interaction patterns, and model interpretability. This paper evaluates whether LLM action selectors preserve an interpretable reference policy in an OSN simulation. The reference is a finite state machine implemented as a first-order Markov model, with transition probabilities depending on the user type. The evaluation uses a synthetic network with 1,000 agents and 10,000 action decisions. Three open-weight LLMs are tested: LLaMA 3.1, GPT-OSS, and Mistral 24B. Each model is evaluated under three prompting strategies: base, guided, and probabilistic. Alignment is measured using Jensen-Shannon Divergence with Laplace smoothing, and execution time is reported. Results show that LLMs can approximate the reference policy in some configurations, but do not preserve it reliably. Alignment varies across models and prompts, and additional guidance can introduce systematic action biases. Even the best-aligned LLM configurations are several hundred times slower than direct Markov chain sampling. These findings indicate that LLM-based action selection is not a direct replacement for explicit decision policies: it can alter the intended behavior while increasing computational cost.

View PDFOpen arXiv