ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.05185
52
6

Concise Reasoning via Reinforcement Learning

7 April 2025
Mehdi Fatemi
Banafsheh Rafiee
Mingjie Tang
Kartik Talamadupula
    ReLM
    OffRL
    LRM
ArXivPDFHTML
Abstract

Despite significant advancements in large language models (LLMs), a major drawback of reasoning models is their enormous token usage, which increases computational cost, resource requirements, and response time. In this work, we revisit the core principles of reinforcement learning (RL) and, through mathematical analysis, demonstrate that the tendency to generate lengthy responses arises inherently from RL-based optimization during training. This finding questions the prevailing assumption that longer responses inherently improve reasoning accuracy. Instead, we uncover a natural correlation between conciseness and accuracy that has been largely overlooked. We show that introducing a secondary phase of RL training, using a very small set of problems, can significantly reduce chains of thought while maintaining or even enhancing accuracy. Additionally, we demonstrate that, while GRPO shares some interesting properties of PPO, it suffers from collapse modes, which limit its reliability for concise reasoning. Finally, we validate our conclusions through extensive experimental results.

View on arXiv
@article{fatemi2025_2504.05185,
  title={ Concise Reasoning via Reinforcement Learning },
  author={ Mehdi Fatemi and Banafsheh Rafiee and Mingjie Tang and Kartik Talamadupula },
  journal={arXiv preprint arXiv:2504.05185},
  year={ 2025 }
}
Comments on this paper