Reducing Dueling Bandits to Cardinal Bandits

14 May 2014

Papers citing "Reducing Dueling Bandits to Cardinal Bandits"

34 / 34 papers shown

Title
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits Shang Lu Shuji Kijima 40 0 0 08 May 2025
Clustering Items through Bandit Feedback: Finding the Right Feature out of Many Maximilian Graf Victor Thuot Nicolas Verzélen 46 0 0 14 Mar 2025
Online Clustering of Dueling Bandits Zhiyong Wang Jiahang Sun Mingze Kong Jize Xie Qinghua Hu J. C. Lui Zhongxiang Dai 83 0 0 04 Feb 2025
Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents Fanzeng Xia Hao Liu Yisong Yue Tongxin Li 67 1 0 03 Jan 2025
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference Qining Zhang Lei Ying OffRL 37 2 0 25 Sep 2024
Biased Dueling Bandits with Stochastic Delayed Feedback Bongsoo Yi Yue Kang Yao Li 38 1 0 26 Aug 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback Arun Verma Zhongxiang Dai Xiaoqiang Lin Patrick Jaillet K. H. Low 37 5 0 24 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis Qining Zhang Honghao Wei Lei Ying OffRL 67 1 0 11 Jun 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback Ruitao Chen Liwei Wang 75 1 0 18 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback Qiwei Di Jiafan He Quanquan Gu 29 1 0 16 Apr 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF Banghua Zhu Michael I. Jordan Jiantao Jiao 31 25 0 29 Jan 2024
Principled Reinforcement Learning with Human Feedback from Pairwise or $K$ -wise Comparisons Banghua Zhu Jiantao Jiao Michael I. Jordan OffRL 42 183 0 26 Jan 2023
Dueling Bandits: From Two-dueling to Multi-dueling Yihan Du Siwei Wang Longbo Huang 11 3 0 16 Nov 2022
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits Thomas Kleine Buening Aadirupa Saha 46 6 0 25 Oct 2022
Dueling Convex Optimization with General Preferences Aadirupa Saha Tomer Koren Yishay Mansour 28 2 0 27 Sep 2022
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem Arpit Agarwal R. Ghuge V. Nagarajan 22 1 0 25 Sep 2022
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits Suprovat Ghoshal Aadirupa Saha 23 11 0 23 Feb 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences Aadirupa Saha Pierre Gaillard 36 8 0 14 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability Aadirupa Saha A. Krishnamurthy 39 35 0 24 Nov 2021
Dueling RL: Reinforcement Learning with Trajectory Preferences Aldo Pacchiano Aadirupa Saha Jonathan Lee 33 82 0 08 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits Aadirupa Saha Shubham Gupta 33 10 0 06 Nov 2021
Preference learning along multiple criteria: A game-theoretic perspective Kush S. Bhatia A. Pananjady Peter L. Bartlett Anca Dragan Martin J. Wainwright 25 13 0 05 May 2021
KLUCB Approach to Copeland Bandits Nischal Agrawal P. Chaporkar 11 1 0 07 Feb 2019
Ordinal Monte Carlo Tree Search Tobias Joppen Johannes Furnkranz 11 2 0 14 Jan 2019
PAC Battling Bandits in the Plackett-Luce Model Aadirupa Saha Aditya Gopalan 23 33 0 12 Aug 2018
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits Julian Zimmert Yevgeny Seldin AAML 21 174 0 19 Jul 2018
Regret Analysis for Continuous Dueling Bandit Wataru Kumagai 26 27 0 21 Nov 2017
Multi-dueling Bandits with Dependent Arms Yanan Sui Vincent Zhuang J. W. Burdick Yisong Yue 20 80 0 29 Apr 2017
Preferential Bayesian Optimization Javier I. González Zhenwen Dai Andreas C. Damianou Neil D. Lawrence 17 110 0 12 Apr 2017
Double Thompson Sampling for Dueling Bandits Huasen Wu Xin Liu 14 87 0 25 Apr 2016
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem Junpei Komiyama Junya Honda H. Kashima Hiroshi Nakagawa 15 92 0 08 Jun 2015
Copeland Dueling Bandits M. Zoghi Zohar Karnin Shimon Whiteson Maarten de Rijke 21 89 0 01 Jun 2015
Contextual Dueling Bandits Miroslav Dudík Katja Hofmann Robert Schapire Aleksandrs Slivkins M. Zoghi 32 120 0 23 Feb 2015
Sparse Dueling Bandits Kevin G. Jamieson S. Katariya Atul Deshpande Robert D. Nowak 21 64 0 31 Jan 2015