Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1506.02550
Cited By
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
8 June 2015
Junpei Komiyama
Junya Honda
H. Kashima
Hiroshi Nakagawa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem"
24 / 24 papers shown
Title
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Shang Lu
Shuji Kijima
42
0
0
08 May 2025
Online Clustering of Dueling Bandits
Zhiyong Wang
Jiahang Sun
Mingze Kong
Jize Xie
Qinghua Hu
J. C. Lui
Zhongxiang Dai
91
0
0
04 Feb 2025
Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents
Fanzeng Xia
Hao Liu
Yisong Yue
Tongxin Li
72
1
0
03 Jan 2025
Biased Dueling Bandits with Stochastic Delayed Feedback
Bongsoo Yi
Yue Kang
Yao Li
43
1
0
26 Aug 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
42
5
0
24 Jul 2024
Multi-Player Approaches for Dueling Bandits
Or Raveh
Junya Honda
Masashi Sugiyama
54
1
0
25 May 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
75
1
0
18 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
34
1
0
16 Apr 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
36
25
0
29 Jan 2024
Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit
Tian Huang
Ke Li
Ke Li
31
1
0
23 Nov 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
K
K
K
-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
44
184
0
26 Jan 2023
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
51
6
0
25 Oct 2022
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
Arpit Agarwal
R. Ghuge
V. Nagarajan
30
1
0
25 Sep 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
41
8
0
14 Feb 2022
Non-Stationary Dueling Bandits
Patrick Kolpaczki
Viktor Bengs
Eyke Hüllermeier
40
7
0
02 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Aadirupa Saha
A. Krishnamurthy
42
35
0
24 Nov 2021
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
38
82
0
08 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Aadirupa Saha
Shubham Gupta
35
10
0
06 Nov 2021
Learning the Optimal Recommendation from Explorative Users
Fan Yao
Chuanhao Li
Denis Nekipelov
Hongning Wang
Haifeng Xu
OffRL
19
7
0
06 Oct 2021
Minimal Exploration in Structured Stochastic Bandits
Richard Combes
Stefan Magureanu
Alexandre Proutiere
44
115
0
01 Nov 2017
Multi-dueling Bandits with Dependent Arms
Yanan Sui
Vincent Zhuang
J. W. Burdick
Yisong Yue
28
80
0
29 Apr 2017
Dueling Bandits with Dependent Arms
Bangrui Chen
P. Frazier
11
2
0
28 May 2016
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm
Junpei Komiyama
Junya Honda
Hiroshi Nakagawa
22
39
0
05 May 2016
Double Thompson Sampling for Dueling Bandits
Huasen Wu
Xin Liu
22
87
0
25 Apr 2016
1