ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16142
  4. Cited By
Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning
v1v2 (latest)

Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning

22 May 2025
Shicheng Xu
Liang Pang
Yunchang Zhu
Jia Gu
Zihao Wei
Jingcheng Deng
Feiyang Pan
Huawei Shen
Xueqi Cheng
    OffRLLRM
ArXiv (abs)PDFHTML

Papers citing "Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning"

6 / 6 papers shown
Title
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLMLRM
229
128
0
18 Apr 2025
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Hardy Chen
Haoqin Tu
Fali Wang
Hui Liu
Xianfeng Tang
Xinya Du
Yuyin Zhou
Cihang Xie
ReLMVLMOffRLLRM
163
36
0
10 Apr 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRLLRM
234
217
0
18 Mar 2025
Following the Autoregressive Nature of LLM Embeddings via Compression and Alignment
Following the Autoregressive Nature of LLM Embeddings via Compression and Alignment
Jingcheng Deng
Zhongtao Jiang
Liang Pang
Liwei Chen
Kun Xu
Zihao Wei
Huawei Shen
Xueqi Cheng
112
3
0
17 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
245
128
0
28 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
387
2,022
0
22 Jan 2025
1