ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1312.3393
  4. Cited By
Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem

12 December 2013
M. Zoghi
Shimon Whiteson
Rémi Munos
Maarten de Rijke
ArXivPDFHTML

Papers citing "Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem"

43 / 43 papers shown
Title
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Shang Lu
Shuji Kijima
42
0
0
08 May 2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
50
0
0
09 Feb 2025
Online Clustering of Dueling Bandits
Online Clustering of Dueling Bandits
Zhiyong Wang
Jiahang Sun
Mingze Kong
Jize Xie
Qinghua Hu
J. C. Lui
Zhongxiang Dai
88
0
0
04 Feb 2025
Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents
Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents
Fanzeng Xia
Hao Liu
Yisong Yue
Tongxin Li
72
1
0
03 Jan 2025
Biased Dueling Bandits with Stochastic Delayed Feedback
Biased Dueling Bandits with Stochastic Delayed Feedback
Bongsoo Yi
Yue Kang
Yao Li
43
1
0
26 Aug 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
42
5
0
24 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
Multi-Player Approaches for Dueling Bandits
Multi-Player Approaches for Dueling Bandits
Or Raveh
Junya Honda
Masashi Sugiyama
51
1
0
25 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
34
1
0
16 Apr 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
36
25
0
29 Jan 2024
Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit
Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit
Tian Huang
Ke Li
Ke Li
31
1
0
23 Nov 2023
Borda Regret Minimization for Generalized Linear Dueling Bandits
Borda Regret Minimization for Generalized Linear Dueling Bandits
Yue Wu
Tao Jin
Hao Lou
Farzad Farnoud
Quanquan Gu
39
11
0
15 Mar 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons
Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
42
184
0
26 Jan 2023
Dueling Bandits: From Two-dueling to Multi-dueling
Dueling Bandits: From Two-dueling to Multi-dueling
Yihan Du
Siwei Wang
Longbo Huang
19
3
0
16 Nov 2022
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive
  Non-Stationary Dueling Bandits
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
51
6
0
25 Oct 2022
Dueling Convex Optimization with General Preferences
Dueling Convex Optimization with General Preferences
Aadirupa Saha
Tomer Koren
Yishay Mansour
30
3
0
27 Sep 2022
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit
  Problem
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
Arpit Agarwal
R. Ghuge
V. Nagarajan
30
1
0
25 Sep 2022
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank
  Preference Bandits
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits
Suprovat Ghoshal
Aadirupa Saha
25
11
0
23 Feb 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online
  Learning from Preferences
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
41
8
0
14 Feb 2022
Non-Stationary Dueling Bandits
Non-Stationary Dueling Bandits
Patrick Kolpaczki
Viktor Bengs
Eyke Hüllermeier
40
7
0
02 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under
  Realizability
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Aadirupa Saha
A. Krishnamurthy
42
35
0
24 Nov 2021
Dueling RL: Reinforcement Learning with Trajectory Preferences
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
38
82
0
08 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary
  Dueling Bandits
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Aadirupa Saha
Shubham Gupta
33
10
0
06 Nov 2021
Choice functions based multi-objective Bayesian optimisation
Choice functions based multi-objective Bayesian optimisation
A. Benavoli
Dario Azzimonti
Dario Piga
30
1
0
15 Oct 2021
Learning the Optimal Recommendation from Explorative Users
Learning the Optimal Recommendation from Explorative Users
Fan Yao
Chuanhao Li
Denis Nekipelov
Hongning Wang
Haifeng Xu
OffRL
19
7
0
06 Oct 2021
Preference learning along multiple criteria: A game-theoretic
  perspective
Preference learning along multiple criteria: A game-theoretic perspective
Kush S. Bhatia
A. Pananjady
Peter L. Bartlett
Anca Dragan
Martin J. Wainwright
43
13
0
05 May 2021
A unified framework for closed-form nonparametric regression,
  classification, preference and mixed problems with Skew Gaussian Processes
A unified framework for closed-form nonparametric regression, classification, preference and mixed problems with Skew Gaussian Processes
A. Benavoli
Dario Azzimonti
Dario Piga
32
15
0
12 Dec 2020
Preferential Bayesian optimisation with Skew Gaussian Processes
Preferential Bayesian optimisation with Skew Gaussian Processes
A. Benavoli
Dario Azzimonti
Dario Piga
22
20
0
15 Aug 2020
Cascading Non-Stationary Bandits: Online Learning to Rank in the
  Non-Stationary Cascade Model
Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model
Chang Li
Maarten de Rijke
26
17
0
29 May 2019
KLUCB Approach to Copeland Bandits
KLUCB Approach to Copeland Bandits
Nischal Agrawal
P. Chaporkar
16
1
0
07 Feb 2019
Ordinal Monte Carlo Tree Search
Ordinal Monte Carlo Tree Search
Tobias Joppen
Johannes Furnkranz
18
2
0
14 Jan 2019
PAC Battling Bandits in the Plackett-Luce Model
PAC Battling Bandits in the Plackett-Luce Model
Aadirupa Saha
Aditya Gopalan
23
33
0
12 Aug 2018
Ordered Preference Elicitation Strategies for Supporting Multi-Objective
  Decision Making
Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making
L. Zintgraf
D. Roijers
Sjoerd Linders
Catholijn M. Jonker
A. Nowé
16
49
0
21 Feb 2018
Regret Analysis for Continuous Dueling Bandit
Regret Analysis for Continuous Dueling Bandit
Wataru Kumagai
34
27
0
21 Nov 2017
Multi-dueling Bandits with Dependent Arms
Multi-dueling Bandits with Dependent Arms
Yanan Sui
Vincent Zhuang
J. W. Burdick
Yisong Yue
28
80
0
29 Apr 2017
Preferential Bayesian Optimization
Preferential Bayesian Optimization
Javier I. González
Zhenwen Dai
Andreas C. Damianou
Neil D. Lawrence
23
110
0
12 Apr 2017
Dueling Bandits with Dependent Arms
Dueling Bandits with Dependent Arms
Bangrui Chen
P. Frazier
11
2
0
28 May 2016
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm,
  and Computationally Efficient Algorithm
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm
Junpei Komiyama
Junya Honda
Hiroshi Nakagawa
22
39
0
05 May 2016
Double Thompson Sampling for Dueling Bandits
Double Thompson Sampling for Dueling Bandits
Huasen Wu
Xin Liu
22
87
0
25 Apr 2016
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
Junpei Komiyama
Junya Honda
H. Kashima
Hiroshi Nakagawa
17
92
0
08 Jun 2015
Copeland Dueling Bandits
Copeland Dueling Bandits
M. Zoghi
Zohar Karnin
Shimon Whiteson
Maarten de Rijke
23
89
0
01 Jun 2015
Contextual Dueling Bandits
Contextual Dueling Bandits
Miroslav Dudík
Katja Hofmann
Robert Schapire
Aleksandrs Slivkins
M. Zoghi
37
120
0
23 Feb 2015
Sparse Dueling Bandits
Sparse Dueling Bandits
Kevin G. Jamieson
S. Katariya
Atul Deshpande
Robert D. Nowak
29
64
0
31 Jan 2015
1