ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.18892
  4. Cited By
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

24 March 2025
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
    OffRL
    ReLM
    LRM
ArXivPDFHTML

Papers citing "SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild"

50 / 97 papers shown
Title
Diversity-Aware Policy Optimization for Large Language Model Reasoning
Diversity-Aware Policy Optimization for Large Language Model Reasoning
Jian Yao
Ran Cheng
Xingyu Wu
Jibin Wu
Kay Chen Tan
LRM
76
0
0
29 May 2025
Pitfalls of Rule- and Model-based Verifiers -- A Case Study on Mathematical Reasoning
Pitfalls of Rule- and Model-based Verifiers -- A Case Study on Mathematical Reasoning
Yuzhen Huang
Weihao Zeng
Xingshan Zeng
Qi Zhu
Junxian He
LRM
65
0
0
28 May 2025
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
Tian Qin
Core Francisco Park
Mujin Kwun
Aaron Walsman
Eran Malach
Nikhil Anand
Hidenori Tanaka
David Alvarez-Melis
ReLM
OffRL
LRM
57
0
0
28 May 2025
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Hanting Chen
Yasheng Wang
Kai Han
Dong Li
Lin Li
...
Hailin Hu
Yehui Tang
Dacheng Tao
Xinghao Chen
Yunhe Wang
LRM
73
0
0
28 May 2025
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley
Mingyu Chen
Zhaolin Gao
Jason D. Lee
Wen Sun
Wenhao Zhan
Xuezhou Zhang
OffRL
LRM
62
0
0
27 May 2025
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
Mingyang Song
Mao Zheng
OffRL
LRM
71
0
0
27 May 2025
DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving
DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving
Muxi Diao
Lele Yang
Hongbo Yin
Zhexu Wang
Yejie Wang
Daxin Tian
Kongming Liang
Zhanyu Ma
VLM
LRM
48
0
0
27 May 2025
ARM: Adaptive Reasoning Model
ARM: Adaptive Reasoning Model
Siye Wu
Jian Xie
Yikai Zhang
Aili Chen
Kai Zhang
Yu Su
Yanghua Xiao
LRM
64
0
0
26 May 2025
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers
Rihui Xin
Han Liu
Zecheng Wang
Yupeng Zhang
Dianbo Sui
Xiaolin Hu
Bingning Wang
SyDa
44
1
0
26 May 2025
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
Junteng Liu
Yuanxiang Fan
Z. L. Jiang
Han Ding
Yongyi Hu
...
Yunan Huang
Mozhi Zhang
Pengyu Zhao
Junjie Yan
Junxian He
OffRL
NAI
SyDa
LRM
ELM
37
2
0
26 May 2025
Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners
Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners
Jiabao Ji
Yongchao Chen
Yang Zhang
Ramana Rao Kompella
Chuchu Fan
Gaowen Liu
Shiyu Chang
79
0
0
26 May 2025
One-shot Entropy Minimization
One-shot Entropy Minimization
Zitian Gao
Lynx Chen
Joey Zhou
Bryan Dai
28
3
0
26 May 2025
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Mingyuan Wu
Jingcheng Yang
Jize Jiang
Meitang Li
Kaizhuo Yan
Hanchao Yu
Minjia Zhang
Chengxiang Zhai
Klara Nahrstedt
LRM
122
0
0
25 May 2025
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
Wenlong Deng
Yi Ren
Muchen Li
Danica J. Sutherland
Xiaoxiao Li
Christos Thrampoulidis
50
0
0
24 May 2025
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Huayu Chen
Kaiwen Zheng
Qinsheng Zhang
Ganqu Cui
Yin Cui
Haotian Ye
Tsung-Yi Lin
Ming-Yu Liu
Jun Zhu
Haoxiang Wang
OffRL
LRM
228
2
0
23 May 2025
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning
Yutong Chen
Jiandong Gao
Ji Wu
ALM
201
0
0
23 May 2025
Stable Reinforcement Learning for Efficient Reasoning
Muzhi Dai
Shixuan Liu
Qingyi Si
OffRL
LRM
104
0
0
23 May 2025
Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Amirhosein Ghasemabadi
Keith G. Mills
Baochun Li
Di Niu
LRM
81
0
0
23 May 2025
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Fanqi Wan
Weizhou Shen
Shengyi Liao
Yingcheng Shi
Chenliang Li
Ziyi Yang
Ji Zhang
Fei Huang
Jingren Zhou
Ming Yan
OffRL
LLMAG
ReLM
LRM
90
0
0
23 May 2025
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
Zhepei Wei
Wenlin Yao
Yao Liu
Weizhi Zhang
Qin Lu
...
Puyang Xu
Chao Zhang
Bing Yin
Hyokun Yun
Lihong Li
OffRL
CLL
OnRL
LRM
72
4
0
22 May 2025
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
Guanting Dong
Yifei Chen
Xiaoxi Li
Jiajie Jin
Hongjin Qian
Yutao Zhu
Hangyu Mao
Guorui Zhou
Ji-Rong Wen
Ji-Rong Wen
LLMAG
SyDa
LRM
97
0
0
22 May 2025
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang
Xueyuan Lin
Chengjin Xu
Xuhui Jiang
Xiaojun Wu
Honghao Liu
Hui Xiong
Jian Guo
LRM
81
0
0
22 May 2025
Effective Reinforcement Learning for Reasoning in Language Models
Effective Reinforcement Learning for Reasoning in Language Models
Lianghuan Huang
Shuo Li
Sagnik Anupam
Insup Lee
Osbert Bastani
LRM
54
0
0
22 May 2025
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
Wang Yang
Zirui Liu
Hongye Jin
Qingyu Yin
Vipin Chaudhary
Xiaotian Han
ReLM
LRM
52
0
0
22 May 2025
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
Wei Liu
Siya Qi
Xinyu Wang
Chen Qian
Yali Du
Yulan He
OffRL
LRM
74
0
0
21 May 2025
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities
Jinyang Wu
Chonghua Liao
Mingkuan Feng
Shuai Zhang
Zhengqi Wen
Pengpeng Shao
Huazhe Xu
Jianhua Tao
LRM
OffRL
99
3
0
21 May 2025
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
Zihao Li
Xu Wang
Yuzhe Yang
Ziyu Yao
Haoyi Xiong
Jundong Li
LLMSV
LRM
107
1
0
21 May 2025
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Yurun Yuan
Fan Chen
Zeyu Jia
Alexander Rakhlin
Tengyang Xie
OffRL
113
1
0
21 May 2025
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning
Minwu Kim
Anubhav Shrestha
Safal Shrestha
Aadim Nepal
Keith Ross
49
0
0
20 May 2025
General-Reasoner: Advancing LLM Reasoning Across All Domains
General-Reasoner: Advancing LLM Reasoning Across All Domains
Xueguang Ma
Qian Liu
Dongfu Jiang
Ge Zhang
Zejun Ma
Wenhu Chen
AI4CE
LRM
88
5
0
20 May 2025
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Penghui Qi
Zichen Liu
Tianyu Pang
Chao Du
W. Lee
Min Lin
OffRL
LRM
61
2
0
19 May 2025
Thinkless: LLM Learns When to Think
Thinkless: LLM Learns When to Think
Gongfan Fang
Xinyin Ma
Xinchao Wang
LLMAG
OffRL
ReLM
LRM
98
3
0
19 May 2025
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Xiaoyuan Liu
Tian Liang
Zhiwei He
Jiahao Xu
Wenxuan Wang
Pinjia He
Zhaopeng Tu
Haitao Mi
Dong Yu
OffRL
ReLM
LRM
93
0
0
19 May 2025
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
Haoyuan Wu
Xueyi Chen
Rui Ming
Jilong Gao
Shoubo Hu
Zhuolun He
Bei Yu
LRM
91
0
0
19 May 2025
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
Safal Shrestha
Minwu Kim
Aadim Nepal
Anubhav Shrestha
Keith Ross
OffRL
ReLM
LRM
67
0
0
19 May 2025
Effective and Transparent RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability
Effective and Transparent RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability
Jingyi Ren
Yekun Xu
Xiaolong Wang
Weitao Li
Weizhi Ma
Yang Liu
RALM
70
0
0
19 May 2025
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
Hengli Li
Chenxi Li
Tong Wu
Xuekai Zhu
Yuxuan Wang
...
Eric Hanchen Jiang
Song-Chun Zhu
Zixia Jia
Ying Nian Wu
Zilong Zheng
LRM
104
1
0
19 May 2025
Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs
Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs
Zhuo Yang
Lingli Ge
Dong Han
Tianfan Fu
Yuqiang Li
55
0
0
19 May 2025
MR. Judge: Multimodal Reasoner as a Judge
MR. Judge: Multimodal Reasoner as a Judge
Renjie Pi
Felix Bai
Qibin Chen
Simon Wang
Jiulong Shan
Kieran Liu
Meng Cao
ELM
LRM
102
0
0
19 May 2025
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
Minghan Chen
Guikun Chen
Wenguan Wang
Yi Yang
82
2
0
18 May 2025
Efficient RL Training for Reasoning Models via Length-Aware Optimization
Efficient RL Training for Reasoning Models via Length-Aware Optimization
Danlong Yuan
Tian Xie
Shaohan Huang
Zhuocheng Gong
Huishuai Zhang
Chong Luo
Furu Wei
Dongyan Zhao
OffRL
LRM
VLM
71
1
0
18 May 2025
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee
Lifan Yuan
Dilek Hakkani-Tur
Hao Peng
100
0
0
16 May 2025
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Songjun Tu
Jiahao Lin
Qichao Zhang
Xiangyu Tian
Linjing Li
Xiangyuan Lan
Dongbin Zhao
OffRL
ReLM
LRM
93
3
0
16 May 2025
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization
Chengyu Huang
Zhengxin Zhang
Claire Cardie
LRM
106
0
0
16 May 2025
Beyond Áha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Beyond Áha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Zhiyuan Hu
Yansen Wang
Hanze Dong
Yuhui Xu
Amrita Saha
Caiming Xiong
Bryan Hooi
Junnan Li
LRM
76
2
0
15 May 2025
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Enci Zhang
Xingang Yan
Wei Lin
Tianxiang Zhang
Qianchun Lu
LRM
63
0
0
13 May 2025
Learning from Peers in Reasoning Models
Learning from Peers in Reasoning Models
Tongxu Luo
Wenyu Du
Jiaxi Bi
Stephen Chung
Zhengyang Tang
Hao Yang
M. Zhang
Benyou Wang
LRM
64
0
0
12 May 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
Liu Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon Shaolei Du
Yelong Shen
OffRL
ReLM
LRM
278
34
0
29 Apr 2025
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Zihan Wang
Kaidi Wang
Q. Wang
Pingyue Zhang
Linjie Li
...
Jiajun Wu
L. Fei-Fei
Lijuan Wang
Yejin Choi
Manling Li
196
24
0
24 Apr 2025
TTRL: Test-Time Reinforcement Learning
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Li Sheng
Li Sheng
Xuekai Zhu
...
Youbang Sun
Zhiyuan Ma
Lifan Yuan
Ning Ding
Bowen Zhou
OffRL
377
26
0
22 Apr 2025
12
Next