Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning

While reasoning-augmented large language models (RLLMs) significantly enhance complex task performance through extended reasoning chains, they inevitably introduce substantial unnecessary token consumption, particularly for simpler problems where Short Chain-of-Thought (Short CoT) suffices. This overthinking phenomenon leads to inefficient resource usage without proportional accuracy gains. To address this issue, we propose Self-Route, a dynamic reasoning framework that automatically selects between general and reasoning modes based on model capability estimation. Our approach introduces a lightweight pre-inference stage to extract capability-aware embeddings from hidden layer representations, enabling real-time evaluation of the model's ability to solve problems. We further construct Gradient-10K, a model difficulty estimation-based dataset with dense complexity sampling, to train the router for precise capability boundary detection. Extensive experiments demonstrate that Self-Route achieves comparable accuracy to reasoning models while reducing token consumption by 30-55\% across diverse benchmarks. The proposed framework demonstrates consistent effectiveness across models with different parameter scales and reasoning paradigms, highlighting its general applicability and practical value.
View on arXiv@article{he2025_2505.20664, title={ Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning }, author={ Yang He and Xiao Ding and Bibo Cai and Yufei Zhang and Kai Xiong and Zhouhao Sun and Bing Qin and Ting Liu }, journal={arXiv preprint arXiv:2505.20664}, year={ 2025 } }