MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

2026-06-04Artificial Intelligence

Artificial IntelligenceComputation and Language
AI summary

The authors developed MLEvolve, a system that helps language models improve their own machine learning algorithms over time, especially for complex tasks. Their approach overcomes problems like isolated search paths and lack of memory by using a method called Progressive MCGS that shares information across different search branches. They also add a special memory system to remember past experiences and separate planning from coding tasks for better stability. Tests show MLEvolve works well across different problems and is faster than some previous methods. The authors shared their code for others to use.

Large Language ModelsMachine Learning EngineeringMulti-Agent SystemsProgressive MCGSRetrospective MemoryHierarchical ControlTree SearchAlgorithm DiscoveryCross-Domain GeneralizationEntropy-inspired Schedule
Authors
Shangheng Du, Xiangchao Yan, Jinxin Shi, Zongsheng Cao, Shiyang Feng, Zichen Liang, Boyuan Sun, Tianshuo Peng, Yifan Zhou, Xin Li, Jie Zhou, Liang He, Bo Zhang, Lei Bai
Abstract
Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless search, and lack of hierarchical control, which together hinder long-horizon optimization. We present MLEvolve, an LLM-based self-evolving multi-agent framework for end-to-end machine learning algorithm discovery. By extending tree search to Progressive MCGS, MLEvolve enables cross-branch information flow through graph-based reference edges and gradually shifts the search from broad exploration to focused exploitation with an entropy-inspired progressive schedule. To allow the agent to evolve with accumulated experience, we introduce Retrospective Memory, which combines a cold-start domain knowledge base with a dynamic global memory for task-specific experience retrieval and reuse. For stable long-horizon iteration, we further decouple strategic planning from code generation with adaptive coding modes. Evaluation on MLE-Bench shows that MLEvolve achieves state-of-the-art performance across multiple dimensions including average medal rate and valid submission rate under a 12-hour budget (half the standard runtime). Moreover, MLEvolve also outperforms specialized algorithm discovery methods including AlphaEvolve on mathematical algorithm optimization tasks, demonstrating strong cross-domain generalization. Our code is available at https://github.com/InternScience/MLEvolve.