SkillOS: Learning Skill Curation for Self-Evolving Agents
2026-05-07 • Artificial Intelligence
Artificial IntelligenceComputation and Language
AI summaryⓘ
The authors developed SkillOS, a new method that helps AI agents learn and improve from past experiences by managing a collection of reusable skills. Unlike previous methods, SkillOS uses a special training approach where one part of the agent uses skills while another part learns how to update and organize these skills based on feedback from related tasks. This approach lets the agent get better over time at solving complex tasks and can work across different types of AI systems and problem areas. The researchers found that SkillOS leads to smarter skill use and better organized skills that capture higher-level strategies.
Large Language Model (LLM)skill curationreinforcement learning (RL)self-evolving agentstask dependenciesexperience-driven trainingagent executorSkillRepocomposite rewardsmulti-turn tasks
Authors
Siru Ouyang, Jun Yan, Yanfei Chen, Rujun Han, Zifeng Wang, Bhavana Dalvi Mishra, Rui Meng, Chun-Liang Li, Yizhu Jiao, Kaiwen Zha, Maohao Shen, Vishy Tirumalashetty, George Lee, Jiawei Han, Tomas Pfister, Chen-Yu Lee
Abstract
LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existing approaches either rely on manual skill curation, prescribe heuristic skill operations, or train for short-horizon skill operations. However, they still struggle to learn complex long-term curation policies from indirect and delayed feedback. To tackle this challenge, we propose SkillOS, an experience-driven RL training recipe for learning skill curation in self-evolving agents. SkillOS pairs a frozen agent executor that retrieves and applies skills with a trainable skill curator that updates an external SkillRepo from accumulated experience. To provide learning signals for curation, we design composite rewards and train on grouped task streams based on skill-relevant task dependencies, where earlier trajectories update the SkillRepo, and later related tasks evaluate these updates. Across multi-turn agentic tasks and single-turn reasoning tasks, SkillOS consistently outperforms memory-free and strong memory-based baselines in both effectiveness and efficiency, with the learned skill curator generalizing across different executor backbones and task domains. Further analyses show that the learned curator produces more targeted skill use, while the skills in SkillRepo evolve into more richly structured Markdown files that encode higher-level meta-skills over time.