ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17373
73
1

Value-Guided Search for Efficient Chain-of-Thought Reasoning

23 May 2025
Kaiwen Wang
Jin Peng Zhou
Jonathan D. Chang
Zhaolin Gao
Nathan Kallus
Kianté Brantley
Wen Sun
Author Contacts:
kw437@cornell.edujz563@cornell.edu
    LRM
ArXiv (abs)PDFHTML
Main:9 Pages
20 Figures
Bibliography:4 Pages
6 Tables
Appendix:16 Pages
Abstract

In this paper, we propose a simple and efficient method for value model training on long-context reasoning traces. Compared to existing process reward models (PRMs), our method does not require a fine-grained notion of "step," which is difficult to define for long-context reasoning models. By collecting a dataset of 2.5 million reasoning traces, we train a 1.5B token-level value model and apply it to DeepSeek models for improved performance with test-time compute scaling. We find that block-wise value-guided search (VGS) with a final weighted majority vote achieves better test-time scaling than standard methods such as majority voting or best-of-n. With an inference budget of 64 generations, VGS with DeepSeek-R1-Distill-1.5B achieves an average accuracy of 45.7% across four competition math benchmarks (AIME 2024 & 2025, HMMT Feb 2024 & 2025), reaching parity with o3-mini-medium. Moreover, VGS significantly reduces the inference FLOPs required to achieve the same performance of majority voting. Our dataset, model and codebase are open-sourced.

View on arXiv
@article{wang2025_2505.17373,
  title={ Value-Guided Search for Efficient Chain-of-Thought Reasoning },
  author={ Kaiwen Wang and Jin Peng Zhou and Jonathan Chang and Zhaolin Gao and Nathan Kallus and Kianté Brantley and Wen Sun },
  journal={arXiv preprint arXiv:2505.17373},
  year={ 2025 }
}
Comments on this paper