Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.11343
Cited By
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
15 April 2025
Wei Xiong
Jiarui Yao
Yuhui Xu
Bo Pang
Lei Wang
Doyen Sahoo
Junnan Li
Nan Jiang
Tong Zhang
Caiming Xiong
Hanze Dong
OffRL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce"
6 / 6 papers shown
Title
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
Minghan Chen
Guikun Chen
Wenguan Wang
Yi Yang
12
0
0
18 May 2025
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee
Lifan Yuan
Dilek Hakkani-Tur
Hao Peng
7
0
0
16 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
9
0
0
16 May 2025
Beyond Áha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Zhiyuan Hu
Yansen Wang
Hanze Dong
Yuhui Xu
Amrita Saha
Caiming Xiong
Bryan Hooi
Junnan Li
LRM
24
0
0
15 May 2025
Scalable Chain of Thoughts via Elastic Reasoning
Yuhui Xu
Hanze Dong
Lei Wang
Doyen Sahoo
Junnan Li
Caiming Xiong
OffRL
LRM
51
2
0
08 May 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao
Yifan Hao
Hanning Zhang
Hanze Dong
Wei Xiong
Nan Jiang
Tong Zhang
LRM
62
0
0
05 May 2025
1