Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.12306
Cited By
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
24 November 2021
Aadirupa Saha
A. Krishnamurthy
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability"
17 / 17 papers shown
Title
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
58
3
0
06 Oct 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
P. Jaillet
K. H. Low
37
5
0
24 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
72
1
0
18 May 2024
Active Preference Learning for Ordering Items In- and Out-of-sample
Herman Bergström
Emil Carlsson
Devdatt Dubhashi
Fredrik D. Johansson
47
0
0
05 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
29
1
0
16 Apr 2024
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
24
17
0
14 Feb 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
41
94
0
08 Jan 2024
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
Zihao Li
Zhuoran Yang
Mengdi Wang
OffRL
31
54
0
29 May 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
K
K
K
-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
33
181
0
26 Jan 2023
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
43
6
0
25 Oct 2022
Dueling Convex Optimization with General Preferences
Aadirupa Saha
Tomer Koren
Yishay Mansour
20
2
0
27 Sep 2022
Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles
Aldo G. Carranza
Sanath Kumar Krishnamurthy
Susan Athey
16
1
0
30 Mar 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
36
8
0
14 Feb 2022
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
33
81
0
08 Nov 2021
Optimal Dynamic Regret in Exp-Concave Online Learning
Dheeraj Baby
Yu-Xiang Wang
45
43
0
23 Apr 2021
1