Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.04548
Cited By
An Empirical Study on Eliciting and Improving R1-like Reasoning Models
6 March 2025
Zhongfu Chen
Yingqian Min
Beichen Zhang
Jie Chen
Jinhao Jiang
Daixuan Cheng
Wayne Xin Zhao
Zhengyang Liang
Xu Miao
Yaojie Lu
Lei Fang
Zhongyuan Wang
Ji-Rong Wen
ReLM
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"An Empirical Study on Eliciting and Improving R1-like Reasoning Models"
21 / 21 papers shown
Title
Pedagogy-R1: Pedagogically-Aligned Reasoning Model with Balanced Educational Benchmark
Unggi Lee
Jaeyong Lee
Jiyeong Bae
Yeil Jeong
Junbo Koh
Gyeonggeon Lee
Gunho Lee
Taekyung Ahn
Hyeoncheol Kim
LRM
41
0
0
24 May 2025
Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning
Zezhong Wang
Xingshan Zeng
Weiwen Liu
Yijiao Wang
Liangyou Li
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRM
65
0
0
23 May 2025
DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation
Bowen Zheng
Xiaolei Wang
Enze Liu
Xi Wang
Lu Hongyu
Yu Chen
Wayne Xin Zhao
Ji-Rong Wen
88
0
0
22 May 2025
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models
Yakun Zhu
Zhongzhen Huang
Linjie Mu
Yutong Huang
Wei Nie
Jiaji Liu
Shaoting Zhang
Pengfei Liu
Xiaofan Zhang
LM&MA
ELM
LRM
126
0
0
20 May 2025
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models
Bin Yu
Hang Yuan
Haotian Li
X. Xu
Yuliang Wei
Bailing Wang
Weizhen Qi
Kai Chen
LRM
95
4
0
06 May 2025
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
Xinyu Zhang
Jiadong Wang
Zifei Cheng
Wenhao Zhuang
Zheng Lin
...
Shouyu Yin
Chaohang Wen
Haotian Zhang
Bin Chen
Bing Yu
LRM
143
13
0
19 Apr 2025
Slow Thinking for Sequential Recommendation
Junjie Zhang
Beichen Zhang
Wenqi Sun
Hongyu Lu
Wayne Xin Zhao
Yu Chen
Ji-Rong Wen
OffRL
LRM
99
1
0
13 Apr 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi
Yiyang Wu
Linxin Song
Dinesh Manocha
Jieyu Zhao
LRM
153
15
0
07 Apr 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
176
137
0
24 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
206
218
0
18 Mar 2025
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Huatong Song
Jinhao Jiang
Yingqian Min
Jie Chen
Zhongfu Chen
Wayne Xin Zhao
Lei Fang
Ji-Rong Wen
AI4TS
LRM
KELM
175
43
0
07 Mar 2025
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
Zican Dong
Junyi Li
Jinhao Jiang
Mingyu Xu
Wayne Xin Zhao
Bin Wang
Xin Wu
VLM
339
5
0
11 Feb 2025
Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation
Sadegh Mahdavi
Muchen Li
Kaiwen Liu
Christos Thrampoulidis
Leonid Sigal
Renjie Liao
LRM
63
10
0
24 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zihao Huang
Ziyao Xu
Zhiyong Yang
Zonghan Yang
Zongyu Lin
OffRL
ALM
AI4TS
VLM
LRM
285
338
0
22 Jan 2025
T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
Zhenyu Hou
Xin Lv
Rui Lu
Jing Zhang
Yongqian Li
Zijun Yao
Juanzi Li
J. Tang
Yuxiao Dong
OffRL
LRM
ReLM
140
33
0
20 Jan 2025
Enhancing LLM Reasoning with Reward-guided Tree Search
Jinhao Jiang
Zhongfu Chen
Yingqian Min
Jie Chen
Xiaoxue Cheng
...
Zhengyang Liang
Dong Yan
Jian Xie
Ziyi Wang
Ji-Rong Wen
LRM
143
33
0
03 Jan 2025
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen
Jiahao Xu
Tian Liang
Zhiwei He
Jianhui Pang
...
Zizhuo Zhang
Rui Wang
Zhaopeng Tu
Haitao Mi
Dong Yu
LRM
ReLM
182
197
0
30 Dec 2024
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Bofei Gao
Feifan Song
Zhiyong Yang
Zefan Cai
Yibo Miao
...
Lei Sha
Yichang Zhang
Xuancheng Ren
Tianyu Liu
Baobao Chang
ELM
LRM
88
65
0
10 Oct 2024
HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng
Chi Zhang
Zilingfeng Ye
Xibin Wu
Wang Zhang
Ru Zhang
Size Zheng
Haibin Lin
Chuan Wu
AI4CE
186
241
0
28 Sep 2024
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Jian Hu
Xibin Wu
Weixun Wang
OpenLLMAI Team
Dehao Zhang
Yu Cao
AI4CE
VLM
99
130
0
20 May 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
886
13,207
0
04 Mar 2022
1