Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.04618
Cited By
Better Process Supervision with Bi-directional Rewarding Signals
6 March 2025
Wenxiang Chen
Wei He
Zhiheng Xi
Honglin Guo
Boyang Hong
Jiazheng Zhang
Rui Zheng
Nijun Li
Tao Gui
Yun Li
Qi Zhang
Xuanjing Huang
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Better Process Supervision with Bi-directional Rewarding Signals"
4 / 4 papers shown
Title
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
157
6
0
17 Mar 2025
Examining False Positives under Inference Scaling for Mathematical Reasoning
Yu Guang Wang
Nan Yang
Liang Wang
Furu Wei
LRM
144
4
0
10 Feb 2025
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Mingyang Song
Zhaochen Su
Xiaoye Qu
Jiawei Zhou
Yu Cheng
LRM
144
40
0
06 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
299
331
0
03 Jan 2025
1