Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1312.3393
Cited By
Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem
12 December 2013
M. Zoghi
Shimon Whiteson
Rémi Munos
Maarten de Rijke
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem"
43 / 43 papers shown
Title
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Shang Lu
Shuji Kijima
42
0
0
08 May 2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
50
0
0
09 Feb 2025
Online Clustering of Dueling Bandits
Zhiyong Wang
Jiahang Sun
Mingze Kong
Jize Xie
Qinghua Hu
J. C. Lui
Zhongxiang Dai
88
0
0
04 Feb 2025
Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents
Fanzeng Xia
Hao Liu
Yisong Yue
Tongxin Li
72
1
0
03 Jan 2025
Biased Dueling Bandits with Stochastic Delayed Feedback
Bongsoo Yi
Yue Kang
Yao Li
43
1
0
26 Aug 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
42
5
0
24 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
Multi-Player Approaches for Dueling Bandits
Or Raveh
Junya Honda
Masashi Sugiyama
51
1
0
25 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
34
1
0
16 Apr 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
36
25
0
29 Jan 2024
Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit
Tian Huang
Ke Li
Ke Li
31
1
0
23 Nov 2023
Borda Regret Minimization for Generalized Linear Dueling Bandits
Yue Wu
Tao Jin
Hao Lou
Farzad Farnoud
Quanquan Gu
39
11
0
15 Mar 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
K
K
K
-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
42
184
0
26 Jan 2023
Dueling Bandits: From Two-dueling to Multi-dueling
Yihan Du
Siwei Wang
Longbo Huang
19
3
0
16 Nov 2022
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
51
6
0
25 Oct 2022
Dueling Convex Optimization with General Preferences
Aadirupa Saha
Tomer Koren
Yishay Mansour
30
3
0
27 Sep 2022
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
Arpit Agarwal
R. Ghuge
V. Nagarajan
30
1
0
25 Sep 2022
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits
Suprovat Ghoshal
Aadirupa Saha
25
11
0
23 Feb 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
41
8
0
14 Feb 2022
Non-Stationary Dueling Bandits
Patrick Kolpaczki
Viktor Bengs
Eyke Hüllermeier
40
7
0
02 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Aadirupa Saha
A. Krishnamurthy
42
35
0
24 Nov 2021
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
38
82
0
08 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Aadirupa Saha
Shubham Gupta
33
10
0
06 Nov 2021
Choice functions based multi-objective Bayesian optimisation
A. Benavoli
Dario Azzimonti
Dario Piga
30
1
0
15 Oct 2021
Learning the Optimal Recommendation from Explorative Users
Fan Yao
Chuanhao Li
Denis Nekipelov
Hongning Wang
Haifeng Xu
OffRL
19
7
0
06 Oct 2021
Preference learning along multiple criteria: A game-theoretic perspective
Kush S. Bhatia
A. Pananjady
Peter L. Bartlett
Anca Dragan
Martin J. Wainwright
43
13
0
05 May 2021
A unified framework for closed-form nonparametric regression, classification, preference and mixed problems with Skew Gaussian Processes
A. Benavoli
Dario Azzimonti
Dario Piga
32
15
0
12 Dec 2020
Preferential Bayesian optimisation with Skew Gaussian Processes
A. Benavoli
Dario Azzimonti
Dario Piga
22
20
0
15 Aug 2020
Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model
Chang Li
Maarten de Rijke
26
17
0
29 May 2019
KLUCB Approach to Copeland Bandits
Nischal Agrawal
P. Chaporkar
16
1
0
07 Feb 2019
Ordinal Monte Carlo Tree Search
Tobias Joppen
Johannes Furnkranz
18
2
0
14 Jan 2019
PAC Battling Bandits in the Plackett-Luce Model
Aadirupa Saha
Aditya Gopalan
23
33
0
12 Aug 2018
Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making
L. Zintgraf
D. Roijers
Sjoerd Linders
Catholijn M. Jonker
A. Nowé
16
49
0
21 Feb 2018
Regret Analysis for Continuous Dueling Bandit
Wataru Kumagai
34
27
0
21 Nov 2017
Multi-dueling Bandits with Dependent Arms
Yanan Sui
Vincent Zhuang
J. W. Burdick
Yisong Yue
28
80
0
29 Apr 2017
Preferential Bayesian Optimization
Javier I. González
Zhenwen Dai
Andreas C. Damianou
Neil D. Lawrence
23
110
0
12 Apr 2017
Dueling Bandits with Dependent Arms
Bangrui Chen
P. Frazier
11
2
0
28 May 2016
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm
Junpei Komiyama
Junya Honda
Hiroshi Nakagawa
22
39
0
05 May 2016
Double Thompson Sampling for Dueling Bandits
Huasen Wu
Xin Liu
22
87
0
25 Apr 2016
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
Junpei Komiyama
Junya Honda
H. Kashima
Hiroshi Nakagawa
17
92
0
08 Jun 2015
Copeland Dueling Bandits
M. Zoghi
Zohar Karnin
Shimon Whiteson
Maarten de Rijke
23
89
0
01 Jun 2015
Contextual Dueling Bandits
Miroslav Dudík
Katja Hofmann
Robert Schapire
Aleksandrs Slivkins
M. Zoghi
37
120
0
23 Feb 2015
Sparse Dueling Bandits
Kevin G. Jamieson
S. Katariya
Atul Deshpande
Robert D. Nowak
29
64
0
31 Jan 2015
1