Mandol: An Agglomerative Agent Memory System for Long-Term Conversations

2026-06-29Databases

DatabasesArtificial IntelligenceComputation and LanguageInformation Retrieval
AI summary

The authors propose Mandol, a new memory system for conversational agents that need to remember information across multiple sessions. Unlike existing systems that spread memory across different databases causing delays, Mandol combines everything into one unified structure using semantic graphs. It also improves search with smart techniques that avoid unnecessary data and speed up responses without extra work for the language model. Tests show Mandol is more accurate and faster than other memory systems on standard benchmarks.

Conversational agentsLong-term memorySemantic graphsMemory retrievalVector databasesGraph databasesRAG methodsContext generationQuery routingLatency
Authors
Yuhan Zhang, Zhiyuan Guo, Ziheng Zeng, Wei Wang, Wentao Wu, Lijie Xu
Abstract
Long-term conversational agents need to remember and query cross-session, multi-typed information with complex correlations. Existing agent memory systems rely on heterogeneous vector and graph databases, which fragment memory information and cause high cross-database I/O latency. For retrieval, common RAG-style methods tend to introduce noise, miss correlated clues, and lack token budget control, degrading LLM accuracy and efficiency. We propose Mandol, an agglomerative memory system that consolidates fragmented memory representations and storage into a unified memory-native architecture. Its core components include: (1) a hierarchical memory model that organizes memory into a basic layer representing raw memory information and a high-level abstract layer that agglomerates basic memories into traceable abstract memories, both uniformly represented as structured semantic graphs; (2) an agglomerative semantic data structure combining SemanticMap and SemanticGraph, which natively fuses key-value, vector, and graph structures and provides unified hybrid retrieval operators to eliminate cross-database I/O; and (3) a quantitative query mechanism with query-adaptive routing, quantitative denoising and conflict resolution, and token-constrained context generation, all without involving LLMs during retrieval. Experiments on two widely used long-term conversation benchmarks, LoCoMo and LongMemEval, show that Mandol achieves the best overall accuracy among representative agent memory systems. For performance comparison, Mandol also obtains a 5.4x retrieval speedup and a 4.8x insertion speedup under 10 QPS concurrent load, while still maintaining low latency on consumer-grade hardware.