LifeSide: Benchmarking Agents as Lifelong Digital Companions

2026-06-03 • Computation and Language

Computation and Language

AI summaryⓘ

The authors created a new test called enchmark to better evaluate digital companions that interact with users over a long time. They point out that existing tests only check short-term memory or empathy separately, which isn't enough. Their benchmark simulates ongoing interactions with virtual users, tracking memory, emotions, privacy, and environment changes together. Their results show that even the best current models struggle to understand users well and keep good companionship over long periods.

digital companionsmemory recallempathymulti-session interactionmulti-agent simulationuser modelingprivacy controlemotional companionshipbenchmarks

Authors

Yuqian Wu, Zhijie Deng, Wei Chen, Junwei Li, Yutian Jiang, Junle Chen, Zhengjun Huang, Qingxiang Liu, Jing Tang, Jiaheng Wei, Yuxuan Liang

Abstract

Lifelong digital companions must integrate cross-session cues, continually update their understanding of users, and adapt to shifting privacy boundaries. Existing evaluations fail to capture this, testing memory recall and short-term empathy in isolation. To bridge this gap, we introduce \benchmark, a benchmark centered on multi-session \textit{Memory-Emotion-Environment} loops. By modeling users as persistent worlds with layered profiles and event trajectories, \benchmark uses multi-agent simulation to project environmental dynamics into dialogue, preserving the critical gap between latent thoughts and observable expressions. Evaluating 2,000 personas and 111K tasks across memory tracking, user understanding, privacy control, and emotional companionship, our experiment results reveal a stark reality: even models that saturate current memory benchmarks fail to sustain accurate user understanding and true companionship over long horizons.

View PDFOpen arXiv