ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.15612
  4. Cited By
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

21 May 2025
Wei Liu
Ruochen Zhou
Yiyun Deng
Yuzhen Huang
Junteng Liu
Yuntian Deng
Yizhe Zhang
Junxian He
    OffRL
    LRM
ArXivPDFHTML

Papers citing "Learn to Reason Efficiently with Adaptive Length-based Reward Shaping"

2 / 2 papers shown
Title
Demystifying Long Chain-of-Thought Reasoning in LLMs
Demystifying Long Chain-of-Thought Reasoning in LLMs
Edward Yeo
Yuxuan Tong
Morry Niu
Graham Neubig
Xiang Yue
OffRL
LRM
112
107
0
05 Feb 2025
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
245
18,685
0
20 Jul 2017
1