Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.15288
Cited By
v1
v2 (latest)
Active teacher selection for reinforcement learning from human feedback
23 October 2023
Rachel Freedman
Justin Svegliato
K. H. Wray
Stuart J. Russell
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Active teacher selection for reinforcement learning from human feedback"
23 / 23 papers shown
Title
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
85
4
0
03 Apr 2025
When Can Proxies Improve the Sample Complexity of Preference Learning?
Yuchen Zhu
Daniel Augusto de Souza
Zhengyan Shi
Mengyue Yang
Pasquale Minervini
Alexander DÁmour
Matt J. Kusner
125
1
0
21 Dec 2024
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer
Rachel Freedman
J. Heitzig
Wesley H. Holliday
Bob M. Jacobs
...
Eric Pacuit
Stuart Russell
Hailey Schoelkopf
Emanuel Tewolde
W. Zwicker
103
40
0
16 Apr 2024
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
140
531
0
27 Jul 2023
Active Reward Learning from Multiple Teachers
Peter Barnett
Rachel Freedman
Justin Svegliato
Stuart J. Russell
57
15
0
02 Mar 2023
On the Sensitivity of Reward Inference to Misspecified Human Models
Joey Hong
Kush S. Bhatia
Anca Dragan
52
26
0
09 Dec 2022
Misspecification in Inverse Reinforcement Learning
Joar Skalse
Alessandro Abate
64
24
0
06 Dec 2022
The Expertise Problem: Learning from Specialized Feedback
Oliver Daniels-Koch
Rachel Freedman
OffRL
56
18
0
12 Nov 2022
Defining and Characterizing Reward Hacking
Joar Skalse
Nikolaus H. R. Howe
Dmitrii Krasheninnikov
David M. Krueger
115
61
0
27 Sep 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
256
2,623
0
12 Apr 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
888
13,207
0
04 Mar 2022
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
Alexander Pan
Kush S. Bhatia
Jacob Steinhardt
98
182
0
10 Jan 2022
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training
Kimin Lee
Laura M. Smith
Pieter Abbeel
OffRL
65
289
0
09 Jun 2021
Consequences of Misaligned AI
Simon Zhuang
Dylan Hadfield-Menell
71
75
0
07 Feb 2021
Choice Set Misspecification in Reward Inference
Rachel Freedman
Rohin Shah
Anca Dragan
69
19
0
19 Jan 2021
Understanding Learned Reward Functions
Eric J. Michaud
Adam Gleave
Stuart J. Russell
XAI
OffRL
69
34
0
10 Dec 2020
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
259
2,189
0
02 Sep 2020
Introduction to Multi-Armed Bandits
Aleksandrs Slivkins
658
1,023
0
15 Apr 2019
Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning
S. Milli
Anca Dragan
70
22
0
09 Mar 2019
An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning
Dhruv Malik
Malayandi Palaniappan
J. F. Fisac
Dylan Hadfield-Menell
Stuart J. Russell
Anca Dragan
63
31
0
11 Jun 2018
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
218
3,377
0
12 Jun 2017
Cooperative Inverse Reinforcement Learning
Dylan Hadfield-Menell
Anca Dragan
Pieter Abbeel
Stuart J. Russell
99
644
0
09 Jun 2016
The Complexity of Decentralized Control of Markov Decision Processes
D. Bernstein
S. Zilberstein
N. Immerman
111
1,595
0
16 Jan 2013
1