ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.14361
60
5

Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning

20 February 2025
Jiachen Zhu
Congmin Zheng
Jianghao Lin
Kounianhua Du
Ying Wen
Yong Yu
Jun Wang
Weinan Zhang
    LRMReLM
ArXiv (abs)PDFHTML
Abstract

While large language models (LLMs) have significantly advanced mathematical reasoning, Process Reward Models (PRMs) have been developed to evaluate the logical validity of reasoning steps. However, PRMs still struggle with out-of-distribution (OOD) challenges. This paper identifies key OOD issues, including step OOD, caused by differences in reasoning patterns across model types and sizes, and question OOD, which arises from dataset shifts between training data and real-world problems. To address these issues, we introduce Retrieval-Augmented Process Reward Model (RetrievalPRM), a novel framework designed to tackle these OOD issues. By utilizing a two-stage retrieval-enhanced mechanism, RetrievalPRM retrieves semantically similar questions and steps as a warmup, enhancing PRM's ability to evaluate target steps and improving generalization and reasoning consistency across different models and problem types. Our extensive experiments demonstrate that RetrievalPRM outperforms existing baselines across multiple real-world datasets. Our open-source contributions include a retrieval-enhanced dataset, a tuning framework for PRM training, and the RetrievalPRM model, establishing a new standard for PRM performance.

View on arXiv
@article{zhu2025_2502.14361,
  title={ Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning },
  author={ Jiachen Zhu and Congmin Zheng and Jianghao Lin and Kounianhua Du and Ying Wen and Yong Yu and Jun Wang and Weinan Zhang },
  journal={arXiv preprint arXiv:2502.14361},
  year={ 2025 }
}
Comments on this paper