MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs
2026-05-11 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors introduce MAGE, a system where multiple agents work together using a special knowledge graph to help a language model learn without changing its core structure. This knowledge graph keeps track of mistakes fixed by teachers and correct reasoning steps the model took before, which guides the model when solving new tasks. MAGE improves learning by updating the graph and decision processes while keeping the main model frozen, leading to better performance on diverse problems. Their tests show that both the model's own success memories and teacher corrections help, each in different types of tasks.
Language modelKnowledge graphMulti-agent systemFrozen backboneReinforcement learningCurriculum learningEpisodic memoryTask-conditioned retrievalBandit algorithmsReasoning tasks
Authors
Ruiyi Yang, Zechen Li, Hao Xue, Imran Razzak, Flora D. Salim
Abstract
Self-evolving language-model agents must decide what to learn next and how to preserve what they have learned across iterations. Existing systems typically carry this cross-iteration knowledge as natural-language feedback, flat episodic memory, or implicit reinforcement signals, none of which cleanly supports a frozen weak backbone at inference time. This paper introduces MAGE (Multi-Agent Graph-guided Evolution), a framework that externalizes self-knowledge into a four-subgraph co-evolutionary knowledge graph. Its experience subgraph stores both teacher-written failure corrections and the learner's own past correct reasoning traces, which are retrieved as task-conditioned guidance for a frozen execution model. During evolution, the graph, a task-level search bandit, and a skill-level routing bandit are updated from the same reward stream, while the learner's backbone remains unchanged. We further provide structural analysis showing how append-only memory growth, bounded curriculum coverage, and task-filtered retrieval together support stable improvement of the retrieval substrate for frozen-learner evolution. Across nine benchmarks spanning mathematical reasoning, multi-hop and open-domain question answering, spatio-temporal analysis, financial numerical reasoning, medical multiple-choice, an open-world survival game, and web navigation, MAGE achieves strong performance against prompt-based frozen-backbone baselines. Ablations show that self-harvested success traces and teacher-written corrections are complementary, with success memories contributing most on reasoning-template-heavy tasks and corrective memories supporting harder composition and interaction settings.