DART: Draft-Agreement Routing for Training-Free Adaptive Thinking Budgets in Hybrid Reasoning Models
2026-06-22 • Artificial Intelligence
Artificial IntelligenceComputation and Language
AI summaryⓘ
The authors developed DART, a method for deciding when a computer model should spend extra time thinking to solve a problem or answer quickly. Unlike previous methods, DART does not need training data and uses simple drafts to guess if more thinking is needed based on how much the drafts disagree. This approach improved accuracy and reduced the time spent thinking on both math and coding tasks. The authors showed that DART works well across different model sizes and types without extra training.
hybrid reasoningroutingthinking budgetdraft entropymath reasoningcode reasoningtraining-freemodel scalingAPI-based models
Authors
Jungseob Lee, Seongtae Hong, Seungjun Lee, Jaehyung Seo, Junyoung Son, Sugyeong Eo, Chanjun Park, Hyeongju Park, Hyeonseok Moon, Heuiseok Lim
Abstract
Hybrid reasoning models can answer directly or spend extra tokens on extended thinking. A practical router should choose between these modes for each query, so easy problems avoid unnecessary reasoning and hard problems receive enough budget to finish the answer. Existing routers move in this direction, but they typically require labeled training data or fix thinking budgets up front, ignoring answer-level evidence from the model itself. We introduce DART, a training-free routing framework that samples two cheap no-think drafts, accepts direct answering when the drafts agree, and predicts a thinking budget from draft entropy when they disagree. Across the main comparisons, DART preserves or improves always-thinking accuracy in most settings while reducing thinking-token use. On math reasoning, accuracy improves by up to $+$9.0 points on Olympiad-level problems while thinking tokens drop 15-69%. On code reasoning under execution-based equivalence, accuracy improves by up to +22.5 points while thinking tokens drop 51-63%. The Stage~1 signal extends across model scales (0.6B-32B), model families, and API-only hosted settings, with no labeled data and no gradient updates required.