Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.04642
Cited By
Teaching Large Language Models to Reason with Reinforcement Learning
7 March 2024
Alex Havrilla
Yuqing Du
Sharath Chandra Raparthy
Christoforos Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Sainbayar Sukhbaatar
Roberta Raileanu
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Teaching Large Language Models to Reason with Reinforcement Learning"
21 / 21 papers shown
Title
Multi-agent Embodied AI: Advances and Future Directions
Zhaohan Feng
Ruiqi Xue
Lei Yuan
Yang Yu
Ning Ding
M. Liu
Bingzhao Gao
Jian Sun
Gang Wang
AI4CE
57
1
0
08 May 2025
Chain-of-Thought Tokens are Computer Program Variables
Fangwei Zhu
Peiyi Wang
Zhifang Sui
LRM
44
0
0
08 May 2025
Aligning Constraint Generation with Design Intent in Parametric CAD
Evan Casey
Tianyu Zhang
Shu Ishida
John Roger Thompson
Amir Hosein Khasahmadi
Joseph George Lambourne
P. Jayaraman
K. Willis
38
0
0
17 Apr 2025
UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality
Zelei Cheng
Xin-Qiang Cai
Yuting Tang
Pushi Zhang
Boming Yang
Xinyu Xing
Xinyu Xing
49
0
0
10 Mar 2025
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi
Ayush Chakravarthy
Anikait Singh
Nathan Lile
Noah D. Goodman
ReLM
LRM
93
30
0
03 Mar 2025
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Weihao Zeng
Yuzhen Huang
Lulu Zhao
Yijun Wang
Zifei Shan
Junxian He
LRM
40
7
0
23 Dec 2024
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Aaron C. Courville
OffRL
79
5
0
23 Oct 2024
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both
Abhijnan Nath
Changsoo Jung
Ethan Seefried
Nikhil Krishnaswamy
134
1
0
11 Oct 2024
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
Zirui Zhao
Hanze Dong
Amrita Saha
Caiming Xiong
Doyen Sahoo
LRM
32
3
0
10 Oct 2024
From Lists to Emojis: How Format Bias Affects Model Alignment
Xuanchang Zhang
Wei Xiong
Lichang Chen
Dinesh Manocha
Heng Huang
Tong Zhang
ALM
35
11
0
18 Sep 2024
Large Language Models Assume People are More Rational than We Really are
Ryan Liu
Jiayi Geng
Joshua C. Peterson
Ilia Sucholutsky
Thomas L. Griffiths
76
16
0
24 Jun 2024
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
N. Sebe
Mubarak Shah
EGVM
89
6
0
22 May 2024
The pitfalls of next-token prediction
Gregor Bachmann
Vaishnavh Nagarajan
37
62
0
11 Mar 2024
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk
Ishita Mediratta
Christoforos Nalmpantis
Jelena Luketina
Eric Hambro
Edward Grefenstette
Roberta Raileanu
AI4CE
ALM
115
122
0
10 Oct 2023
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
46
10
0
28 Aug 2023
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
502
0
28 Sep 2022
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Hung Le
Yue Wang
Akhilesh Deepak Gotmare
Silvio Savarese
S. Hoi
SyDa
ALM
129
240
0
05 Jul 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
316
11,953
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
379
8,495
0
28 Jan 2022
Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning
Xiangyu Peng
Siyan Li
Sarah Wiegreffe
Mark O. Riedl
LRM
50
38
0
04 May 2021
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
280
1,595
0
18 Sep 2019
1