Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.01289
Cited By
Dueling Posterior Sampling for Preference-Based Reinforcement Learning
4 August 2019
Ellen R. Novoseller
Yibing Wei
Yanan Sui
Yisong Yue
J. W. Burdick
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dueling Posterior Sampling for Preference-Based Reinforcement Learning"
19 / 19 papers shown
Title
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
Nan Lu
Ethan X. Fang
Junwei Lu
224
0
0
27 Apr 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback
Muhammad Qasim Elahi
Somtochukwu Oguchienti
Maheed H. Ahmed
Mahsa Ghasemi
OffRL
55
0
0
20 Apr 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
85
0
0
26 Feb 2025
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang
Lei Ying
OffRL
40
2
0
25 Sep 2024
Advances in Preference-based Reinforcement Learning: A Review
Youssef Abdelkareem
Shady Shehata
Fakhri Karray
OffRL
51
9
0
21 Aug 2024
Preference-Guided Reinforcement Learning for Efficient Exploration
Guojian Wang
Faguo Wu
Xiao Zhang
Tianyuan Chen
Xuyang Chen
Lin Zhao
45
0
0
09 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
Comparisons Are All You Need for Optimizing Smooth Functions
Chenyi Zhang
Tongyang Li
AAML
37
1
0
19 May 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
48
96
0
08 Jan 2024
A Learning-Based Framework for Safe Human-Robot Collaboration with Multiple Backup Control Barrier Functions
Neil C. Janwani
Ersin Daş
Thomas Touma
Skylar X. Wei
Tamas G. Molnar
J. W. Burdick
32
2
0
09 Oct 2023
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
Zihao Li
Zhuoran Yang
Mengdi Wang
OffRL
37
55
0
29 May 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
K
K
K
-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
42
184
0
26 Jan 2023
Learning from Imperfect Demonstrations via Adversarial Confidence Transfer
Zhangjie Cao
Zihan Wang
Dorsa Sadigh
AAML
37
7
0
07 Feb 2022
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
33
82
0
08 Nov 2021
On the Expressivity of Markov Reward
David Abel
Will Dabney
Anna Harutyunyan
Mark K. Ho
Michael L. Littman
Doina Precup
Satinder Singh
29
82
0
01 Nov 2021
Preference-based Reinforcement Learning with Finite-Time Guarantees
Yichong Xu
Ruosong Wang
Lin F. Yang
Aarti Singh
A. Dubrawski
36
53
0
16 Jun 2020
Early Detection of Combustion Instabilities using Deep Convolutional Selective Autoencoders on Hi-speed Flame Video
Chandrayee Basu
Qian Yang
M. Singhal
Anca Dragan
51
174
0
25 Mar 2016
Online Structured Prediction via Coactive Learning
Pannagadatta K. Shivaswamy
Thorsten Joachims
HAI
68
66
0
18 May 2012
1