Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.17989
Cited By
v1
v2 (latest)
Outcome-based Reinforcement Learning to Predict the Future
23 May 2025
Benjamin Turtel
Danny Franklin
Kris Skotheim
Luke Hewitt
Philipp Schoenegger
OffRL
AI4TS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Outcome-based Reinforcement Learning to Predict the Future"
11 / 11 papers shown
Title
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter
Shrimai Prabhumoye
Matvei Novikov
Seungju Han
Ying Lin
...
Eric Nyberg
Yejin Choi
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ReLM
OffRL
LRM
457
4
1
15 Apr 2025
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Min Lin
OffRL
LRM
196
171
0
26 Mar 2025
1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training
Han Zhao
Haotian Wang
Yiping Peng
Sitong Zhao
Xiaoyu Tian
Shuaiting Chen
Yunjie Ji
Xiangang Li
RALM
ReLM
LRM
147
16
0
25 Mar 2025
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Tian Xie
Zitian Gao
Qingnan Ren
Haoming Luo
Yuqian Hong
Bryan Dai
Joey Zhou
Kai Qiu
Zhirong Wu
Chong Luo
ReLM
OffRL
LRM
139
80
0
21 Feb 2025
LLMs Can Teach Themselves to Better Predict the Future
Benjamin Turtel
Danny Franklin
Philipp Schoenegger
LRM
169
1
0
07 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
380
2,000
0
22 Jan 2025
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Yun Qu
Yuhang Jiang
Boyuan Wang
Yixiu Mao
Cheems Wang
Chang-Shu Liu
Xiangyang Ji
156
8
0
10 Jan 2025
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
167
1,287
0
05 Feb 2024
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
99
79
0
16 Oct 2023
RLTF: Reinforcement Learning from Unit Test Feedback
Jiate Liu
Yiqin Zhu
Kaiwen Xiao
Qiang Fu
Xiao Han
Wei Yang
Deheng Ye
OffRL
88
62
0
10 Jul 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
389
4,163
0
29 May 2023
1