A Causal Model of Theory of Mind in Conflict for Artificial Intelligence
2026-06-15 • Artificial Intelligence
Artificial IntelligenceHuman-Computer Interaction
AI summaryⓘ
The authors explore when it makes sense for AI to use theory of mind (ToM), which means understanding others' thoughts and intentions. Instead of assuming AI should always try to mentalize, they created a model that decides when ToM should be used based on the situation and agent characteristics. This model uses a network of causes leading to ToM engagement through different pathways, aiming to improve accuracy in reasoning about others. Their work helps AI systems be more efficient and trustworthy by mentalizing only when it is truly helpful.
Theory of MindArtificial IntelligenceCausal ModelDirected Acyclic GraphMentalizingEpistemic AccuracyResource-RationalityHuman-Machine InteractionSocial ReasoningConflict Resolution
Authors
Nikolos Gurney
Abstract
Theory of mind (ToM), the capacity to ascribe mental states to others and use those ascriptions for prediction and inference, is widely assumed to be essential for effective human-machine integration. Existing AI-ToM models address \emph{how} to mentalize, but leave the question of when largely unaddressed. The central question is: under what situational and agent-level conditions is ToM engagement causally warranted in conflict? This paper presents a structural causal model formalized as a directed acyclic graph (DAG), treating ToM as a mechanism activated by situational and agent-level conditions rather than as an always-on capacity. The model specifies four exogenous variables capturing situational and agent-level conditions, five endogenous mediators, and a mechanistic ToM node producing engagement states through three distinct causal pathways: a tractability pathway, a reasoning-depth pathway, and an enabling-cause pathway. The primary outcome is epistemic accuracy, which decouples social reasoning from behavioral policy and generalizes across social phenomena beyond conflict. The framework gives AI systems a principled, resource-rational decision procedure for mentalizing, with implications for efficiency, trust, and the development of robust artificial social intelligence. Simulation validation, empirical human-machine teaming studies, and ethical considerations arising from conflict-optimized mentalizing are discussed.