Adaptive Auto-Harness: Sustained Self-Improvement for Agentic System Deployment on Open-Ended Task Streams

2026-06-01Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors studied systems that help improve large language model (LLM) agents by fine-tuning their prompts and tools based on feedback. They point out that real-world tasks keep changing and growing, which makes existing methods less reliable over time. To tackle this, they created Adaptive Auto-Harness, a system that continuously adapts to new tasks by evolving and routing different solutions, sometimes with human help. Their tests on varied real task streams showed it outperforms previous methods by better adjusting to shifting problems.

Large Language ModelsPrompt OptimizationAuto-Harness SystemsTask AdaptationMulti-Agent SystemsEvolutionary AlgorithmsReal-World Task StreamsHuman-in-the-LoopRouting MechanismsPerformance Degradation
Authors
Zewen Liu, Zhan Shi, Yisi Sang, Bing He, Minhua Lin, Tianxin Wei, Dakuo Wang, Benoit Dumoulin, Wei Jin, Hanqing Lu
Abstract
Auto-harness systems such as A-Evolve, GEPA, and Meta-Harness improve LLM agents by optimizing prompts, skills, tools, memories, and supporting infrastructure from execution feedback, but they are typically evaluated on fixed offline benchmarks. Real deployments instead present open-ended task streams: histories grow without a fixed endpoint, heterogeneous tasks require different harnesses, and problem distributions shift over time. These challenges make a single repeatedly and densely updated harness brittle, causing performance degradation as accuracy peaks early and then declines. This motivates sustained harness construction with task-wise adaptation. We introduce Adaptive Auto-Harness, a framework and system for such streams. The framework decomposes the gap to an oracle harness into evolution loss and adaptation loss. The system addresses these losses with a stateful multi-agent evolver, a harness tree with solve-time routing, and human-steering hooks for cases where history lacks the needed signal. Across prediction-market, security-competition, and event-forecasting streams, Adaptive Auto-Harness outperforms five existing auto-harness baselines and ablations attribute gains to better construction, routing, or targeted human steering. Code is available in https://github.com/A-EVO-Lab/AdaptiveHarness .