The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

2026-05-08 • Computation and Language

Computation and LanguageArtificial IntelligenceComputer Science and Game TheoryMultiagent Systems

AI summaryⓘ

The authors studied how giving large language models (LLMs) access to more past conversation history affects their ability to cooperate in social games. Surprisingly, they found that more memory often made the models less cooperative, a problem they called the 'memory curse.' Through experiments, they showed this happens because longer memory reduces the models’ ability to plan ahead effectively, not because they get more suspicious. They also found that replacing past history with friendly fake memories can help, and that complex thinking steps sometimes make things worse. Overall, the authors highlight that more memory isn't always better—it depends on how the models use that memory during reasoning.

Large Language ModelsContext WindowMulti-Agent Social DilemmasCooperationMemory CurseForward-Looking IntentChain-of-Thought ReasoningMemory SanitizationLoRA AdapterZero-Shot Transfer

Authors

Jiayuan Liu, Tianqin Li, Shiyi Du, Xin Luo, Haoxuan Zeng, Emanuel Tewolde, Tai Sing Lee, Tonghan Wang, Carl Kingsford, Vincent Conitzer

Abstract

Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. Across 7 LLMs and 4 games over 500 rounds, expanding accessible history degrades cooperation in 18 of 28 model--game settings, a pattern we term the memory curse. We isolate the underlying mechanism through three analyses. First, lexical analysis of 378,000 reasoning traces associates this breakdown with eroding forward-looking intent rather than rising paranoia. We validate this using targeted fine-tuning as a cognitive probe: a LoRA adapter trained exclusively on forward-looking traces mitigates the decay and transfers zero-shot to distinct games. Second, memory sanitization holds prompt length fixed while replacing visible history with synthetic cooperative records, which restores cooperation substantially, proving the trigger is memory content, not length alone. Finally, ablating explicit Chain-of-Thought reasoning often reduces the collapse, showing that deliberation paradoxically amplifies the memory curse. Together, these results recast memory as an active determinant of multi-agent behavior: longer recall can either destabilize or support cooperation depending on the reasoning patterns it elicits.

View PDFOpen arXiv