ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.15288
  4. Cited By
Active teacher selection for reinforcement learning from human feedback
v1v2 (latest)

Active teacher selection for reinforcement learning from human feedback

23 October 2023
Rachel Freedman
Justin Svegliato
K. H. Wray
Stuart J. Russell
ArXiv (abs)PDFHTML

Papers citing "Active teacher selection for reinforcement learning from human feedback"

23 / 23 papers shown
Title
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
85
4
0
03 Apr 2025
When Can Proxies Improve the Sample Complexity of Preference Learning?
When Can Proxies Improve the Sample Complexity of Preference Learning?
Yuchen Zhu
Daniel Augusto de Souza
Zhengyan Shi
Mengyue Yang
Pasquale Minervini
Alexander DÁmour
Matt J. Kusner
125
1
0
21 Dec 2024
Social Choice Should Guide AI Alignment in Dealing with Diverse Human
  Feedback
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer
Rachel Freedman
J. Heitzig
Wesley H. Holliday
Bob M. Jacobs
...
Eric Pacuit
Stuart Russell
Hailey Schoelkopf
Emanuel Tewolde
W. Zwicker
103
40
0
16 Apr 2024
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALMOffRL
140
531
0
27 Jul 2023
Active Reward Learning from Multiple Teachers
Active Reward Learning from Multiple Teachers
Peter Barnett
Rachel Freedman
Justin Svegliato
Stuart J. Russell
57
15
0
02 Mar 2023
On the Sensitivity of Reward Inference to Misspecified Human Models
On the Sensitivity of Reward Inference to Misspecified Human Models
Joey Hong
Kush S. Bhatia
Anca Dragan
52
26
0
09 Dec 2022
Misspecification in Inverse Reinforcement Learning
Misspecification in Inverse Reinforcement Learning
Joar Skalse
Alessandro Abate
64
24
0
06 Dec 2022
The Expertise Problem: Learning from Specialized Feedback
The Expertise Problem: Learning from Specialized Feedback
Oliver Daniels-Koch
Rachel Freedman
OffRL
56
18
0
12 Nov 2022
Defining and Characterizing Reward Hacking
Defining and Characterizing Reward Hacking
Joar Skalse
Nikolaus H. R. Howe
Dmitrii Krasheninnikov
David M. Krueger
115
61
0
27 Sep 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
256
2,623
0
12 Apr 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
888
13,207
0
04 Mar 2022
The Effects of Reward Misspecification: Mapping and Mitigating
  Misaligned Models
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
Alexander Pan
Kush S. Bhatia
Jacob Steinhardt
98
182
0
10 Jan 2022
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
  Relabeling Experience and Unsupervised Pre-training
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training
Kimin Lee
Laura M. Smith
Pieter Abbeel
OffRL
65
289
0
09 Jun 2021
Consequences of Misaligned AI
Consequences of Misaligned AI
Simon Zhuang
Dylan Hadfield-Menell
71
75
0
07 Feb 2021
Choice Set Misspecification in Reward Inference
Choice Set Misspecification in Reward Inference
Rachel Freedman
Rohin Shah
Anca Dragan
69
19
0
19 Jan 2021
Understanding Learned Reward Functions
Understanding Learned Reward Functions
Eric J. Michaud
Adam Gleave
Stuart J. Russell
XAIOffRL
69
34
0
10 Dec 2020
Learning to summarize from human feedback
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
259
2,189
0
02 Sep 2020
Introduction to Multi-Armed Bandits
Introduction to Multi-Armed Bandits
Aleksandrs Slivkins
658
1,023
0
15 Apr 2019
Literal or Pedagogic Human? Analyzing Human Model Misspecification in
  Objective Learning
Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning
S. Milli
Anca Dragan
70
22
0
09 Mar 2019
An Efficient, Generalized Bellman Update For Cooperative Inverse
  Reinforcement Learning
An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning
Dhruv Malik
Malayandi Palaniappan
J. F. Fisac
Dylan Hadfield-Menell
Stuart J. Russell
Anca Dragan
63
31
0
11 Jun 2018
Deep reinforcement learning from human preferences
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
218
3,377
0
12 Jun 2017
Cooperative Inverse Reinforcement Learning
Cooperative Inverse Reinforcement Learning
Dylan Hadfield-Menell
Anca Dragan
Pieter Abbeel
Stuart J. Russell
99
644
0
09 Jun 2016
The Complexity of Decentralized Control of Markov Decision Processes
The Complexity of Decentralized Control of Markov Decision Processes
D. Bernstein
S. Zilberstein
N. Immerman
111
1,595
0
16 Jan 2013
1