Prospect-Theory Behavior from Bellman Optimality in MDPs with Catastrophic States

2026-05-31 • Artificial Intelligence

Artificial IntelligenceMachine Learning

AI summaryⓘ

The authors examine decision-making in systems where there is a state that, once reached, ends everything (absorbing catastrophic state). Although their model uses standard decision rules without fancy human-like biases, they find behavior that looks like prospect theory, such as a value curve that bends differently near dangers, sensitivity to losses greater than gains, and switching between risk-taking and caution. They provide a neat formula showing how factors like win chance, payoff balance, and future discounting shape this behavior. Their findings hold up under learning algorithms and different types of randomness, suggesting that simply having a catastrophic failure point can explain prospect-theory-like patterns without extra assumptions.

Markov decision processabsorbing stateBellman optimalityprospect theoryvalue functionloss aversionrisk-sensitive controlQ-learningstochastic transitionsdiscount factor

Authors

Yujiao Chen

Abstract

We study risk-neutral control in Markov decision processes with an absorbing catastrophic state. Even though rewards are linear and the agent has no utility curvature, probability weighting, or framing dependence, standard Bellman optimality produces three prospect-theory-like signatures: an S-shaped value-function profile (convex near catastrophe, concave in the far field), an endogenous loss-sensitivity coefficient $λ^*(S) > 1$, and a reflection-effect policy reversal. Across 495 configurations, the optimal policy plays safe near catastrophe in positive-drift (growth) regimes despite the risky action's higher immediate expected value, and plays risky near catastrophe in negative-drift (decline) regimes despite the safe action's lower immediate expected loss. We derive a closed-form expression for the asymptotic loss-aversion plateau $\barλ$ that depends only on win probability $p$, payoff asymmetry $r = |Δ_\ell/Δ_w|$, and discount factor $β$, and matches numerical solutions to $R^2 = 0.999$. The mechanism does not require asymmetric payoffs. Across a sweep of $(p,β)$ at three asymmetry levels, the asymmetry share of $\barλ$ above unity has median 4.6% at $r = 1.25$ and rises to 13.9% at $r = 2$, with the boundary contribution exceeding the asymmetry contribution in every cell tested. The phenomena persist under tabular Q-learning (a model-free agent reproduces $V^*$ at correlation 0.98 in growth and 1.00 in decline) and under stochastic transitions with Gaussian, heavy-tailed Student-$t_3$, and asymmetric skew-normal noise up to 50% of the step size, where the asymptotic plateau tracks the closed-form prediction within 0.41% for safe-channel noise and within 9.6% for risky-channel or both-channel noise. These results identify absorbing failure states as a sufficient structural mechanism for prospect-theory-like behavior under optimal control.

View PDFOpen arXiv