Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents

2026-05-25 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors found that current memory systems for large language models (LLMs) treat all users the same, which can waste memory on unimportant information while missing important details for users' long-term needs. They created PerMemBench, a new test to measure how well memory systems personalize for different users based on their unique histories. They also developed a simple method called session level storage gating to help the system ignore short-term information. Their experiments showed that personalizing memory helps a lot if the system can perfectly decide what to keep, but figuring out this perfect decision-making is still a big challenge.

large language modelsmemory systemspersonalizationmemory policiesbenchmarkPerMemBenchsession level storage gatinglong horizon taskstransient interactionsuser personas

Authors

Yeonjun In, Wonjoong Kim, Sangwu Park, Kanghoon Yoon, Chanyoung Park

Abstract

Existing large language model (LLM) based memory systems apply universal, static policies that overlook a fundamental reality: the contexts that are worth storing in memory are different across users. This misalignment wastes limited memory budget on transient interactions while failing to preserve critical context for long horizon tasks. To address this gap, we investigate an underexplored question: can LLM based memory systems learn personalized memory policies? We introduce PerMemBench, the first benchmark for evaluating personalized memory systems, featuring multi year, multi domain interaction histories across diverse user personas. We further present the first empirical study of memory personalization, proposing session level storage gating, a lightweight framework that selectively bypasses memory operations for transient sessions. Our study confirms that personalization yields substantial retention gains under perfect gating, yet reveals that accurate gating remains an open and critical challenge.

View PDFOpen arXiv