ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1506.02550
  4. Cited By
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

8 June 2015
Junpei Komiyama
Junya Honda
H. Kashima
Hiroshi Nakagawa
ArXivPDFHTML

Papers citing "Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem"

24 / 24 papers shown
Title
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Shang Lu
Shuji Kijima
42
0
0
08 May 2025
Online Clustering of Dueling Bandits
Online Clustering of Dueling Bandits
Zhiyong Wang
Jiahang Sun
Mingze Kong
Jize Xie
Qinghua Hu
J. C. Lui
Zhongxiang Dai
91
0
0
04 Feb 2025
Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents
Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents
Fanzeng Xia
Hao Liu
Yisong Yue
Tongxin Li
72
1
0
03 Jan 2025
Biased Dueling Bandits with Stochastic Delayed Feedback
Biased Dueling Bandits with Stochastic Delayed Feedback
Bongsoo Yi
Yue Kang
Yao Li
43
1
0
26 Aug 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
42
5
0
24 Jul 2024
Multi-Player Approaches for Dueling Bandits
Multi-Player Approaches for Dueling Bandits
Or Raveh
Junya Honda
Masashi Sugiyama
54
1
0
25 May 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
75
1
0
18 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
34
1
0
16 Apr 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
36
25
0
29 Jan 2024
Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit
Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit
Tian Huang
Ke Li
Ke Li
31
1
0
23 Nov 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons
Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
44
184
0
26 Jan 2023
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive
  Non-Stationary Dueling Bandits
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
51
6
0
25 Oct 2022
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit
  Problem
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
Arpit Agarwal
R. Ghuge
V. Nagarajan
30
1
0
25 Sep 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online
  Learning from Preferences
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
41
8
0
14 Feb 2022
Non-Stationary Dueling Bandits
Non-Stationary Dueling Bandits
Patrick Kolpaczki
Viktor Bengs
Eyke Hüllermeier
40
7
0
02 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under
  Realizability
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Aadirupa Saha
A. Krishnamurthy
42
35
0
24 Nov 2021
Dueling RL: Reinforcement Learning with Trajectory Preferences
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
38
82
0
08 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary
  Dueling Bandits
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Aadirupa Saha
Shubham Gupta
35
10
0
06 Nov 2021
Learning the Optimal Recommendation from Explorative Users
Learning the Optimal Recommendation from Explorative Users
Fan Yao
Chuanhao Li
Denis Nekipelov
Hongning Wang
Haifeng Xu
OffRL
19
7
0
06 Oct 2021
Minimal Exploration in Structured Stochastic Bandits
Minimal Exploration in Structured Stochastic Bandits
Richard Combes
Stefan Magureanu
Alexandre Proutiere
44
115
0
01 Nov 2017
Multi-dueling Bandits with Dependent Arms
Multi-dueling Bandits with Dependent Arms
Yanan Sui
Vincent Zhuang
J. W. Burdick
Yisong Yue
28
80
0
29 Apr 2017
Dueling Bandits with Dependent Arms
Dueling Bandits with Dependent Arms
Bangrui Chen
P. Frazier
11
2
0
28 May 2016
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm,
  and Computationally Efficient Algorithm
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm
Junpei Komiyama
Junya Honda
Hiroshi Nakagawa
22
39
0
05 May 2016
Double Thompson Sampling for Dueling Bandits
Double Thompson Sampling for Dueling Bandits
Huasen Wu
Xin Liu
22
87
0
25 Apr 2016
1