Unified Context Evolution for LLM Agents

2026-06-01Computation and Language

Computation and Language
AI summary

The authors present a method called Unified Context Evolution (UCE) to help AI agents learn and improve from their past experiences across multiple tasks. Instead of starting every task fresh, the agents build and update a library of useful knowledge pieces, like memories and strategies, which are organized by type and scored based on their usefulness. UCE decides what to learn next by focusing on the weak points in this knowledge library. Their approach showed big improvements on two test problems, and the learned knowledge can be used by other AI models without retraining.

LLM-based agentsinteractive taskscontext evolutionknowledge libraryexperience replaymemory typesstrategy learningworkflowskill acquisitiontask transfer
Authors
Zixuan Zhu, Yitong Hu, Yong Dai, Junfeng Fang, Chunyang Jiang, Senkang Hu, Yuzhi Zhao
Abstract
LLM-based agents can solve multi-step interactive tasks by combining reasoning with environment feedback, yet each episode starts from the same fixed context and any useful strategy discovered along the way is lost once the task ends. Existing approaches either limit learning to the current task or pool all experience into a single untyped store, without distinguishing knowledge types, tracking quality through use, or balancing what the library still lacks. We introduce Unified Context Evolution (UCE), a gradient-free framework that externalizes agent experience into an evolving library of typed Evolvable Context Units (ECUs). UCE decomposes experience into four complementary types (Memory, Strategy, Workflow, and Skill), each generated from trajectories under type-specific conditions, retrieved at decision time, scored through repeated usage outcomes, and pruned when no longer valuable. A scheduling module allocates each cycle's generation budget toward the types where the library is weakest. Across two interactive benchmarks, UCE raises ALFWorld success from 75.4% to 96.3% and WebShop task score from 45.1% to 61.3%, and the accumulated library transfers to alternative actor backbones without retraining.