Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1405.3396
Cited By
Reducing Dueling Bandits to Cardinal Bandits
14 May 2014
Nir Ailon
Thorsten Joachims
Zohar Karnin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reducing Dueling Bandits to Cardinal Bandits"
34 / 34 papers shown
Title
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Shang Lu
Shuji Kijima
40
0
0
08 May 2025
Clustering Items through Bandit Feedback: Finding the Right Feature out of Many
Maximilian Graf
Victor Thuot
Nicolas Verzélen
46
0
0
14 Mar 2025
Online Clustering of Dueling Bandits
Zhiyong Wang
Jiahang Sun
Mingze Kong
Jize Xie
Qinghua Hu
J. C. Lui
Zhongxiang Dai
83
0
0
04 Feb 2025
Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents
Fanzeng Xia
Hao Liu
Yisong Yue
Tongxin Li
67
1
0
03 Jan 2025
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang
Lei Ying
OffRL
37
2
0
25 Sep 2024
Biased Dueling Bandits with Stochastic Delayed Feedback
Bongsoo Yi
Yue Kang
Yao Li
38
1
0
26 Aug 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
37
5
0
24 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
75
1
0
18 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
29
1
0
16 Apr 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
Principled Reinforcement Learning with Human Feedback from Pairwise or
K
K
K
-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
42
183
0
26 Jan 2023
Dueling Bandits: From Two-dueling to Multi-dueling
Yihan Du
Siwei Wang
Longbo Huang
11
3
0
16 Nov 2022
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
46
6
0
25 Oct 2022
Dueling Convex Optimization with General Preferences
Aadirupa Saha
Tomer Koren
Yishay Mansour
28
2
0
27 Sep 2022
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
Arpit Agarwal
R. Ghuge
V. Nagarajan
22
1
0
25 Sep 2022
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits
Suprovat Ghoshal
Aadirupa Saha
23
11
0
23 Feb 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
36
8
0
14 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Aadirupa Saha
A. Krishnamurthy
39
35
0
24 Nov 2021
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
33
82
0
08 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Aadirupa Saha
Shubham Gupta
33
10
0
06 Nov 2021
Preference learning along multiple criteria: A game-theoretic perspective
Kush S. Bhatia
A. Pananjady
Peter L. Bartlett
Anca Dragan
Martin J. Wainwright
25
13
0
05 May 2021
KLUCB Approach to Copeland Bandits
Nischal Agrawal
P. Chaporkar
11
1
0
07 Feb 2019
Ordinal Monte Carlo Tree Search
Tobias Joppen
Johannes Furnkranz
11
2
0
14 Jan 2019
PAC Battling Bandits in the Plackett-Luce Model
Aadirupa Saha
Aditya Gopalan
23
33
0
12 Aug 2018
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits
Julian Zimmert
Yevgeny Seldin
AAML
21
174
0
19 Jul 2018
Regret Analysis for Continuous Dueling Bandit
Wataru Kumagai
26
27
0
21 Nov 2017
Multi-dueling Bandits with Dependent Arms
Yanan Sui
Vincent Zhuang
J. W. Burdick
Yisong Yue
20
80
0
29 Apr 2017
Preferential Bayesian Optimization
Javier I. González
Zhenwen Dai
Andreas C. Damianou
Neil D. Lawrence
17
110
0
12 Apr 2017
Double Thompson Sampling for Dueling Bandits
Huasen Wu
Xin Liu
14
87
0
25 Apr 2016
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
Junpei Komiyama
Junya Honda
H. Kashima
Hiroshi Nakagawa
15
92
0
08 Jun 2015
Copeland Dueling Bandits
M. Zoghi
Zohar Karnin
Shimon Whiteson
Maarten de Rijke
21
89
0
01 Jun 2015
Contextual Dueling Bandits
Miroslav Dudík
Katja Hofmann
Robert Schapire
Aleksandrs Slivkins
M. Zoghi
32
120
0
23 Feb 2015
Sparse Dueling Bandits
Kevin G. Jamieson
S. Katariya
Atul Deshpande
Robert D. Nowak
21
64
0
31 Jan 2015
1