Measuring Behavior Portability in Large Language Models
2026-06-22 • Artificial Intelligence
Artificial IntelligenceComputers and SocietyComputer Science and Game Theory
AI summaryⓘ
The authors studied how large language models (LLMs) make decisions in different but similar situations that have the same rewards but look different on the surface. They found that even when the underlying incentives are the same, the models' behavior can change a lot depending on the environment's presentation. To measure this, they created a way to check if knowledge about behavior in one setting can predict behavior in another similar setting. Their experiments showed that such behavior doesn't transfer well, meaning understanding an LLM's decisions in one scenario doesn't guarantee you can predict its decisions in another that looks different but is structurally the same.
large language modelsdecision environmentsbehavioral mappingpayoff structurebehavioral portabilityout-of-sample predictionoracle modeleconomic decision problemsloss-agnostic measure
Authors
Tianjia Dong, Nadav Kunievsky, James A. Evans
Abstract
Large language models are increasingly deployed as autonomous decision makers, yet the behavioral mapping they exhibit can vary substantially across decision environments that are payoff-equivalent by construction-environments that share identical payoff-relevant structure but differ in surface presentation. This sensitivity renders suite-based evaluation fragile and raises a fundamental question of behavioral portability: how well does a behavioral mapping learned in one decision environment informative on another that preserves the same underlying incentive structure? We introduce a formal framework to measure this property. Our protocol fits an interpretable behavioral model on data pooled from a set of source environments and evaluates its out-of-sample predictive performance in a held-out target environment, benchmarking against an oracle trained directly on target data. Portability is quantified via a loss-agnostic measure that delivers worst-case bounds on the performance of the induced prediction-action mapping in the target environment. In controlled experiments spanning seven canonical economic decision problems, we document substantial and systematic portability losses, suggesting that behavioral characterizations of LLMs obtained in one decision environment cannot be assumed to transfer reliably to structurally equivalent alternatives.