Agent-as-a-Router: Agentic Model Routing for Coding Tasks

2026-06-22Artificial Intelligence

Artificial Intelligence
AI summary

The authors study how to choose the best large language model (LLM) for different tasks when multiple LLMs are available, as none excel at everything. They find that adding performance data about tasks improves routing between models. To further improve, they propose Agent-as-a-Router, a system that learns from ongoing task results to make better routing decisions over time. Their method, ACRouter, includes components that orchestrate decisions, verify results, and remember past experiences. Tests show ACRouter makes better choices than previous methods and works well even on new, different tasks.

Large Language ModelsRoutingPerformance StatisticsContext-Action-Feedback loopAgent-based SystemsTask AllocationRegret MinimizationOut-of-Distribution GeneralizationBenchmarkingMemory Module
Authors
Pengfei Zhou, Zhiwei Tang, Yixing Ma, Jiasheng Tang, Yizeng Han, Zhenglin Wan, Fanqing Meng, Wei Wang, Bohan Zhuang, Wangbo Zhao, Yang You
Abstract
Real-world users typically have access to multiple Large Language Models (LLMs) from different providers, and these LLMs often excel at distinct domains, yet none dominate all. Consequently, routing each task to the most suitable model becomes critical for both performance and cost. Existing routers treat this as a static, one-off classification problem. However, we identify the performance bottleneck for these routers as information deficit: simply augmenting a vanilla LLM router with performance statistics at the task-dimension level yields a 15.3% relative gain, surpassing a heuristic router built on the same dimension-level priors. Motivated by this finding, we propose Agent-as-a-Router, a framework that formalizes routing as a C-A-F loop (Context->Action->Feedback->Context). It closes the information gap by accumulating execution-grounded experience during deployment. We instantiate this framework as ACRouter, composed of an Orchestrator, a Verifier, a Memory module, and introduce CodeRouterBench, an evaluation environment comprising ~10K task instances with verified scores from 8 frontier LLMs, enabling regret-based router comparison on streaming tasks. Experiments show that ACRouter achieves the lowest cumulative regret on in-distribution tasks and generalizes to out-of-distribution agentic-programming tasks, demonstrating that our routing framework actively closes the information gap. Codes and benchmarks are released at https://github.com/LanceZPF/agent-as-a-router.