Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2508.05004
Cited By
v1
v2 (latest)
R-Zero: Self-Evolving Reasoning LLM from Zero Data
7 August 2025
Chengsong Huang
Wenhao Yu
Xiaoyang Wang
H. Zhang
Zongxia Li
Ruosen Li
J. Huang
Haitao Mi
Dong Yu
ReLM
SyDa
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (106 upvotes)
Github (592★)
Papers citing
"R-Zero: Self-Evolving Reasoning LLM from Zero Data"
24 / 24 papers shown
Title
OpenSIR: Open-Ended Self-Improving Reasoner
Wai-Chung Kwan
Joshua Ong Jun Leang
Pavlos Vougiouklis
Jeff Z. Pan
Marco Valentino
Pasquale Minervini
ReLM
LRM
28
0
0
01 Nov 2025
Towards Understanding Self-play for LLM Reasoning
Justin Yang Chae
Md Tanvirul Alam
Nidhi Rastogi
ReLM
LRM
98
0
0
31 Oct 2025
Automating Benchmark Design
Amanda Dsouza
Harit Vishwakarma
Zhengyang Qi
Justin Bauer
Derek Pham
Thomas Walshe
Armin Parchami
Frederic Sala
P. Varma
4
0
0
28 Oct 2025
SPICE: Self-Play In Corpus Environments Improves Reasoning
Bo Liu
Chuanyang Jin
Seungone Kim
Weizhe Yuan
Wenting Zhao
Ilia Kulikov
Xian Li
Sainbayar Sukhbaatar
Jack Lanchantin
Jason Weston
ReLM
LRM
40
1
0
28 Oct 2025
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
Yixing Chen
Yiding Wang
Siqi Zhu
Haofei Yu
Tao Feng
Muhan Zhang
M. Patwary
Jiaxuan You
LLMAG
LRM
114
0
0
27 Oct 2025
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
Hongliang Lu
Yuhang Wen
Pengyu Cheng
Ruijin Ding
Haotian Xu
Jiaqi Guo
Chutian Wang
Haonan Chen
Xiaoxi Jiang
Guanjun Jiang
LRM
28
0
0
21 Oct 2025
MENTOR: A Reinforcement Learning Framework for Enabling Tool Use in Small Models via Teacher-Optimized Rewards
Changsu Choi
Hoyun Song
Dongyeon Kim
WooHyeon Jung
Minkyung Cho
Sunjin Park
NohHyeob Bae
Seona Yu
Kyungtae Lim
48
0
0
21 Oct 2025
Deep Self-Evolving Reasoning
Zihan Liu
Shun Zheng
Xumeng Wen
Yang Wang
Jiang Bian
Mao Yang
ReLM
LRM
51
0
0
20 Oct 2025
Towards Agentic Self-Learning LLMs in Search Environment
Wangtao Sun
Xiang Cheng
Jialin Fan
Yao Xu
Xing Yu
Shizhu He
Jun Zhao
Kang Liu
15
0
0
16 Oct 2025
Diagnosing and Mitigating System Bias in Self-Rewarding RL
Chuyi Tan
Peiwen Yuan
Xinglin Wang
Yiwei Li
Shaoxiong Feng
...
Jiayi Shi
Ji Zhang
Boyuan Pan
Yao Hu
Kan Li
20
0
0
10 Oct 2025
LiveOIBench: Can Large Language Models Outperform Human Contestants in Informatics Olympiads?
Kaijian Zou
Aaron Xiong
Yunxiang Zhang
Frederick Zhang
Yueqi Ren
Jirong Yang
Ayoung Lee
Shitanshu Bhushan
Lu Wang
ReLM
ALM
ELM
LRM
41
0
0
10 Oct 2025
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning
Jonas Hübotter
Leander Diaz-Bone
Ido Hakimi
Andreas Krause
Moritz Hardt
54
0
0
06 Oct 2025
Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense
Guobin Shen
Dongcheng Zhao
Haibo Tong
Jindong Li
Feifei Zhao
Yi Zeng
16
0
0
01 Oct 2025
RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training
Tao Ren
Jinyang Jiang
Hui Yang
Wan Tian
Minhao Zou
...
Shentao Qin
Yanjun Zhao
Rui Tao
Hui Shao
Yijie Peng
20
0
0
01 Oct 2025
Round-trip Reinforcement Learning: Self-Consistent Training for Better Chemical LLMs
Lecheng Kong
Xiyuan Wang
Yixin Chen
Muhan Zhang
AI4CE
LRM
41
0
0
01 Oct 2025
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
Zhepei Wei
X. J. Yang
Kai Sun
Jiaqi Wang
Rulin Shao
...
Rakesh Wanga
Anuj Kumar
Yu Meng
Wen-tau Yih
Xin Luna Dong
HILM
LRM
47
2
0
30 Sep 2025
NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning
Raviteja Anantha
Soheil Hor
Teodor Nicola Antoniu
Layne C. Price
AAML
LRM
20
0
0
27 Sep 2025
ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning
Qizhi Pei
Zhuoshi Pan
Honglin Lin
Xin Gao
Yu Li
Zinan Tang
Conghui He
Rui Yan
Lijun Wu
AIMat
OffRL
LRM
79
0
0
25 Sep 2025
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
Yujun Zhou
Zhenwen Liang
Haolin Liu
Wenhao Yu
Kishan Panaganti
Linfeng Song
Dian Yu
Xiangliang Zhang
Haitao Mi
Dong Yu
60
7
0
18 Sep 2025
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Fanqi Kong
Ruijie Zhang
Huaxiao Yin
Guibin Zhang
X. Zhang
Ziang Chen
Zhaowei Zhang
Xiaoyuan Zhang
Song-Chun Zhu
Xue Feng
AAML
106
0
0
17 Sep 2025
Discovering New Theorems via LLMs with In-Context Proof Learning in Lean
Kazumi Kasaura
Naoto Onda
Yuta Oriike
Masaya Taniguchi
Akiyoshi Sannai
Sho Sonoda
LRM
48
0
0
16 Sep 2025
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Runpeng Dai
Linfeng Song
Haolin Liu
Zhenwen Liang
Dian Yu
...
Zhaopeng Tu
R. Liu
Tong Zheng
Hongtu Zhu
Dong Yu
LRM
56
5
0
11 Sep 2025
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Tong Zheng
H. Zhang
Wenhao Yu
Xiaoyang Wang
Runpeng Dai
R. Liu
Huiwen Bao
Chengsong Huang
Heng Huang
Dong Yu
AIMat
ReLM
OffRL
LRM
102
13
0
09 Sep 2025
One Token to Fool LLM-as-a-Judge
Yulai Zhao
Haolin Liu
Dian Yu
Sunyuan Kung
Meijia Chen
Haitao Mi
Dong Yu
OffRL
LRM
62
14
0
11 Jul 2025
1