Joint Agent Memory and Exploration Learning via Novelty Signals
2026-06-01 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors tackle the problem of how AI agents can better explore environments by remembering what they've already done without using too much computing power. They created a system called JAMEL that trains the agent's memory and exploration skills together, using signals that highlight new discoveries during interaction. Their approach helps the agent decide what behaviors have been tried and what is still new, improving exploration in unseen settings. Tests show that JAMEL explores more effectively than other open systems and uses less computing effort.
Open-ended environmentsExploration policyLatent memoryNovelty-driven interactionAgent memoryCode coverageGUI domainToken consumptionAutonomous agentsReinforcement learning
Authors
Shizuo Tian, Xiaohong Weng, Rui Kong, Yuxuan Chen, Guohong Liu, Yuebing Song, Jiacheng Liu, Yuchen Li, Dawei Yin, Ting Cao, Yunxin Liu, Yuanchun Li
Abstract
In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce \textbf{J}oint \textbf{A}gent \textbf{M}emory and \textbf{E}xploration \textbf{L}earning (\textbf{JAMEL}), a framework that trains agentic memory and exploration policy together through novelty-driven interaction. We observe that memory and exploration form a mutually dependent loop: sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, while novelty-seeking interaction provides the supervision needed to make memory useful for future exploration. By utilizing deterministic and persistent novelty signals such as code coverage in the GUI domain, we provide natural, annotation-free supervision for the memory module. Empirical evaluations demonstrate that \ours successfully generalizes to unseen environments. Its exploration capability outperforms open-weight baselines and rivals the exploration depth of a closed-source model while reducing token consumption. Our code and model are open-sourced at https://github.com/MobileLLM/JAMEL.