Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.15612
Cited By
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
21 May 2025
Wei Liu
Ruochen Zhou
Yiyun Deng
Yuzhen Huang
Junteng Liu
Yuntian Deng
Yizhe Zhang
Junxian He
OffRL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learn to Reason Efficiently with Adaptive Length-based Reward Shaping"
2 / 2 papers shown
Title
Demystifying Long Chain-of-Thought Reasoning in LLMs
Edward Yeo
Yuxuan Tong
Morry Niu
Graham Neubig
Xiang Yue
OffRL
LRM
117
107
0
05 Feb 2025
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
248
18,685
0
20 Jul 2017
1