Control-R: Towards controllable test-time scaling

30 May 2025

Main:8 Pages

5 Figures

Bibliography:3 Pages

5 Tables

Appendix:4 Pages

Abstract

This paper target in addressing the challenges of underthinking and overthinking in long chain-of-thought (CoT) reasoning for Large Reasoning Models (LRMs) by introducing Reasoning Control Fields (RCF)--a novel test-time approach that injects structured control signals to guide reasoning from a tree search perspective. RCF enables models to adjust reasoning effort according to given control conditions when solving complex tasks. Additionally, we present the Control-R-4K dataset, which consists of challenging problems annotated with detailed reasoning processes and corresponding control fields. To further enhance reasoning control, we propose a Conditional Distillation Finetuning (CDF) method, which trains model--particularly Control-R-32B--to effectively adjust reasoning effort during test time. Experimental results on benchmarks such as AIME2024 and MATH500 demonstrate that our approach achieves state-of-the-art performance at the 32B scale while enabling a controllable Long CoT reasoning process (L-CoT). Overall, this work introduces an effective paradigm for controllable test-time scaling reasoning.

View on arXiv

@article{zhang2025_2506.00189,
  title={ Control-R: Towards controllable test-time scaling },
  author={ Di Zhang and Weida Wang and Junxian Li and Xunzhi Wang and Jiatong Li and Jianbo Wu and Jingdi Lei and Haonan He and Peng Ye and Shufei Zhang and Wanli Ouyang and Yuqiang Li and Dongzhan Zhou },
  journal={arXiv preprint arXiv:2506.00189},
  year={ 2025 }
}

Comments on this paper