Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.05808
Cited By
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
8 February 2024
Zhiheng Xi
Wenxiang Chen
Boyang Hong
Senjie Jin
Rui Zheng
Wei He
Yiwen Ding
Shichun Liu
Xin Guo
Junzhe Wang
Honglin Guo
Wei Shen
Xiaoran Fan
Yuhao Zhou
Shihan Dou
Xiao Wang
Xinbo Zhang
Peng Sun
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning"
24 / 24 papers shown
Title
Improving RL Exploration for LLM Reasoning through Retrospective Replay
Shihan Dou
Muling Wu
Jingwen Xu
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OffRL
LRM
32
0
0
19 Apr 2025
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
Zhenting Wang
Guofeng Cui
Kun Wan
Wentian Zhao
35
1
0
13 Apr 2025
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Qingyang Zhang
Haitao Wu
Changqing Zhang
Peilin Zhao
Yatao Bian
ReLM
LRM
79
3
0
08 Apr 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
Zhuoshi Pan
Yu-Hu Li
Honglin Lin
Qizhi Pei
Zinan Tang
Wei Yu Wu
Chenlin Ming
H. V. Zhao
Conghui He
Lijun Wu
LRM
59
1
0
21 Mar 2025
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning
Chen Li
Yinyi Luo
Anudeep Bolimera
Uzair Ahmed
Siyang Song
Hrishikesh Gokhale
Marios Savvides
LRM
AI4CE
65
1
0
06 Mar 2025
Improving Retrospective Language Agents via Joint Policy Gradient Optimization
Xueyang Feng
Bo Lan
Quanyu Dai
Lei Wang
Jiakai Tang
X. Chen
Zhenhua Dong
Zhicheng Dou
LLMAG
67
0
0
03 Mar 2025
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Yiwen Ding
Zhiheng Xi
Wei He
Zhuoyuan Li
Yitao Zhai
Xiaowei Shi
Xunliang Cai
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
75
3
0
24 Feb 2025
Improving Value-based Process Verifier via Structural Prior Injection
Zetian Sun
Dongfang Li
Baotian Hu
Jun Yu
Min-Ling Zhang
44
0
0
21 Feb 2025
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Feihu Che
Zengqi Wen
J. Tao
ReLM
LRM
112
9
0
27 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
62
48
1
15 Nov 2024
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Wei He
Zhiheng Xi
Wanxu Zhao
Xiaoran Fan
Yiwen Ding
Zifei Shan
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
56
5
0
24 Oct 2024
Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards
Heejin Do
Sangwon Ryu
Gary Geunbae Lee
31
2
0
26 Sep 2024
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
Yuanzhao Zhai
Tingkai Yang
Kele Xu
Feng Dawei
Cheng Yang
Bo Ding
Huaimin Wang
117
9
0
14 Sep 2024
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning
Zimu Lu
Aojun Zhou
Ke Wang
Houxing Ren
Weikang Shi
Junting Pan
Mingjie Zhan
Hongsheng Li
LRM
42
23
0
30 Jun 2024
Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples
Fangxu Yu
Lai Jiang
Haoqiang Kang
Shibo Hao
Lianhui Qin
LRM
AI4CE
101
10
0
09 Jun 2024
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Zhiheng Xi
Yiwen Ding
Wenxiang Chen
Boyang Hong
Honglin Guo
...
Qi Zhang
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yu-Gang Jiang
LLMAG
LM&Ro
38
29
0
06 Jun 2024
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning
Jun Zhao
Jingqi Tong
Yurong Mou
Ming Zhang
Qi Zhang
Xuanjing Huang
LRM
50
4
0
05 May 2024
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
Changyu Chen
Xiting Wang
Ting-En Lin
Ang Lv
Yuchuan Wu
Xin Gao
Ji-Rong Wen
Rui Yan
Yongbin Li
ReLM
LRM
28
9
0
04 Mar 2024
Design of Chain-of-Thought in Math Problem Solving
Zhanming Jie
Trung Quoc Luong
Xinbo Zhang
Xiaoran Jin
Hang Li
LRM
55
11
0
20 Sep 2023
Self-Evaluation Guided Beam Search for Reasoning
Yuxi Xie
Kenji Kawaguchi
Yiran Zhao
Xu Zhao
MingSung Kan
Junxian He
Qizhe Xie
LRM
166
129
0
01 May 2023
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Hung Le
Yue Wang
Akhilesh Deepak Gotmare
Silvio Savarese
S. Hoi
SyDa
ALM
129
240
0
05 Jul 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
328
4,077
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
319
11,953
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
389
8,495
0
28 Jan 2022
1