Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.19613
Cited By
Self-rewarding correction for mathematical reasoning
26 February 2025
Wei Xiong
Hanning Zhang
Chenlu Ye
Lichang Chen
Nan Jiang
Tong Zhang
ReLM
KELM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-rewarding correction for mathematical reasoning"
8 / 8 papers shown
Title
Scalable Chain of Thoughts via Elastic Reasoning
Yuhui Xu
Hanze Dong
Lei Wang
Doyen Sahoo
Junnan Li
Caiming Xiong
OffRL
LRM
51
1
0
08 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
72
1
0
05 May 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao
Yifan Hao
Hanning Zhang
Hanze Dong
Wei Xiong
Nan Jiang
Tong Zhang
LRM
62
0
0
05 May 2025
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRL
ALM
LRM
44
1
0
23 Apr 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
J. Z. Wang
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
145
0
0
21 Apr 2025
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Chenrui Fan
Ming Li
Lichao Sun
Tianyi Zhou
LRM
51
3
0
09 Apr 2025
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Qingyang Zhang
Haitao Wu
Changqing Zhang
Peilin Zhao
Yatao Bian
ReLM
LRM
79
3
0
08 Apr 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
52
2
0
17 Mar 2025
1