ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1405.3396
  4. Cited By
Reducing Dueling Bandits to Cardinal Bandits

Reducing Dueling Bandits to Cardinal Bandits

14 May 2014
Nir Ailon
Thorsten Joachims
Zohar Karnin
ArXivPDFHTML

Papers citing "Reducing Dueling Bandits to Cardinal Bandits"

34 / 34 papers shown
Title
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Shang Lu
Shuji Kijima
40
0
0
08 May 2025
Clustering Items through Bandit Feedback: Finding the Right Feature out of Many
Clustering Items through Bandit Feedback: Finding the Right Feature out of Many
Maximilian Graf
Victor Thuot
Nicolas Verzélen
46
0
0
14 Mar 2025
Online Clustering of Dueling Bandits
Online Clustering of Dueling Bandits
Zhiyong Wang
Jiahang Sun
Mingze Kong
Jize Xie
Qinghua Hu
J. C. Lui
Zhongxiang Dai
83
0
0
04 Feb 2025
Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents
Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents
Fanzeng Xia
Hao Liu
Yisong Yue
Tongxin Li
67
1
0
03 Jan 2025
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang
Lei Ying
OffRL
37
2
0
25 Sep 2024
Biased Dueling Bandits with Stochastic Delayed Feedback
Biased Dueling Bandits with Stochastic Delayed Feedback
Bongsoo Yi
Yue Kang
Yao Li
38
1
0
26 Aug 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
37
5
0
24 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
75
1
0
18 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
29
1
0
16 Apr 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons
Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
42
183
0
26 Jan 2023
Dueling Bandits: From Two-dueling to Multi-dueling
Dueling Bandits: From Two-dueling to Multi-dueling
Yihan Du
Siwei Wang
Longbo Huang
11
3
0
16 Nov 2022
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive
  Non-Stationary Dueling Bandits
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
46
6
0
25 Oct 2022
Dueling Convex Optimization with General Preferences
Dueling Convex Optimization with General Preferences
Aadirupa Saha
Tomer Koren
Yishay Mansour
28
2
0
27 Sep 2022
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit
  Problem
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
Arpit Agarwal
R. Ghuge
V. Nagarajan
22
1
0
25 Sep 2022
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank
  Preference Bandits
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits
Suprovat Ghoshal
Aadirupa Saha
23
11
0
23 Feb 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online
  Learning from Preferences
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
36
8
0
14 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under
  Realizability
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Aadirupa Saha
A. Krishnamurthy
39
35
0
24 Nov 2021
Dueling RL: Reinforcement Learning with Trajectory Preferences
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
33
82
0
08 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary
  Dueling Bandits
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Aadirupa Saha
Shubham Gupta
33
10
0
06 Nov 2021
Preference learning along multiple criteria: A game-theoretic
  perspective
Preference learning along multiple criteria: A game-theoretic perspective
Kush S. Bhatia
A. Pananjady
Peter L. Bartlett
Anca Dragan
Martin J. Wainwright
25
13
0
05 May 2021
KLUCB Approach to Copeland Bandits
KLUCB Approach to Copeland Bandits
Nischal Agrawal
P. Chaporkar
11
1
0
07 Feb 2019
Ordinal Monte Carlo Tree Search
Ordinal Monte Carlo Tree Search
Tobias Joppen
Johannes Furnkranz
11
2
0
14 Jan 2019
PAC Battling Bandits in the Plackett-Luce Model
PAC Battling Bandits in the Plackett-Luce Model
Aadirupa Saha
Aditya Gopalan
23
33
0
12 Aug 2018
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits
Julian Zimmert
Yevgeny Seldin
AAML
21
174
0
19 Jul 2018
Regret Analysis for Continuous Dueling Bandit
Regret Analysis for Continuous Dueling Bandit
Wataru Kumagai
26
27
0
21 Nov 2017
Multi-dueling Bandits with Dependent Arms
Multi-dueling Bandits with Dependent Arms
Yanan Sui
Vincent Zhuang
J. W. Burdick
Yisong Yue
20
80
0
29 Apr 2017
Preferential Bayesian Optimization
Preferential Bayesian Optimization
Javier I. González
Zhenwen Dai
Andreas C. Damianou
Neil D. Lawrence
17
110
0
12 Apr 2017
Double Thompson Sampling for Dueling Bandits
Double Thompson Sampling for Dueling Bandits
Huasen Wu
Xin Liu
14
87
0
25 Apr 2016
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
Junpei Komiyama
Junya Honda
H. Kashima
Hiroshi Nakagawa
15
92
0
08 Jun 2015
Copeland Dueling Bandits
Copeland Dueling Bandits
M. Zoghi
Zohar Karnin
Shimon Whiteson
Maarten de Rijke
21
89
0
01 Jun 2015
Contextual Dueling Bandits
Contextual Dueling Bandits
Miroslav Dudík
Katja Hofmann
Robert Schapire
Aleksandrs Slivkins
M. Zoghi
32
120
0
23 Feb 2015
Sparse Dueling Bandits
Sparse Dueling Bandits
Kevin G. Jamieson
S. Katariya
Atul Deshpande
Robert D. Nowak
21
64
0
31 Jan 2015
1