ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.06737
60
2

VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data

10 February 2025
Thomas Zeng
Shuibai Zhang
Shutong Wu
Christian Classen
Daewon Chae
Ethan Ewer
Minjae Lee
Heeju Kim
Wonjun Kang
Jackson Kunde
Ying Fan
Jungtaek Kim
H. Koo
K. Ramchandran
Dimitris Papailiopoulos
Kangwook Lee
    LRM
ArXivPDFHTML
Abstract

Process Reward Models (PRMs) have proven effective at enhancing mathematical reasoning for Large Language Models (LLMs) by leveraging increased inference-time computation. However, they are predominantly trained on mathematical data and their generalizability to non-mathematical domains has not been rigorously studied. In response, this work first shows that current PRMs have poor performance in other domains. To address this limitation, we introduce VersaPRM, a multi-domain PRM trained on synthetic reasoning data generated using our novel data generation and annotation method. VersaPRM achieves consistent performance gains across diverse domains. For instance, in the MMLU-Pro category of Law, VersaPRM via weighted majority voting, achieves a 7.9% performance gain over the majority voting baseline -- surpassing Qwen2.5-Math-PRM's gain of 1.3%. We further contribute to the community by open-sourcing all data, code and models for VersaPRM.

View on arXiv
@article{zeng2025_2502.06737,
  title={ VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data },
  author={ Thomas Zeng and Shuibai Zhang and Shutong Wu and Christian Classen and Daewon Chae and Ethan Ewer and Minjae Lee and Heeju Kim and Wonjun Kang and Jackson Kunde and Ying Fan and Jungtaek Kim and Hyung Il Koo and Kannan Ramchandran and Dimitris Papailiopoulos and Kangwook Lee },
  journal={arXiv preprint arXiv:2502.06737},
  year={ 2025 }
}
Comments on this paper