Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18505
Cited By
Provable Reward-Agnostic Preference-Based Reinforcement Learning
29 May 2023
Wenhao Zhan
Masatoshi Uehara
Wen Sun
Jason D. Lee
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Provable Reward-Agnostic Preference-Based Reinforcement Learning"
9 / 9 papers shown
Title
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
Nan Lu
Ethan X. Fang
Junwei Lu
146
0
0
27 Apr 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback
Muhammad Qasim Elahi
Somtochukwu Oguchienti
Maheed H. Ahmed
Mahsa Ghasemi
OffRL
50
0
0
20 Apr 2025
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang
Lei Ying
OffRL
37
2
0
25 Sep 2024
Learning General World Models in a Handful of Reward-Free Deployments
Yingchen Xu
Jack Parker-Holder
Aldo Pacchiano
Philip J. Ball
Oleh Rybkin
Stephen J. Roberts
Tim Rocktaschel
Edward Grefenstette
OffRL
55
8
0
23 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
502
0
28 Sep 2022
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
Xiaoyu Chen
Han Zhong
Zhuoran Yang
Zhaoran Wang
Liwei Wang
123
60
0
23 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Reward-Free Exploration for Reinforcement Learning
Chi Jin
A. Krishnamurthy
Max Simchowitz
Tiancheng Yu
OffRL
112
194
0
07 Feb 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
280
1,595
0
18 Sep 2019
1