Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback

16 April 2024

Qiwei Di

Quanquan Gu

Papers citing "Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback"

38 / 38 papers shown

Title
A Model Selection Approach for Corruption Robust Reinforcement Learning Chen-Yu Wei Christoph Dann Julian Zimmert 109 45 0 31 Dec 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback Arun Verma Zhongxiang Dai Xiaoqiang Lin Patrick Jaillet K. H. Low 138 5 0 24 Jul 2024
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling Yuwei Cheng Fan Yao Xuefeng Liu Haifeng Xu 75 1 0 18 May 2024
Feel-Good Thompson Sampling for Contextual Dueling Bandits Xuheng Li Heyang Zhao Quanquan Gu 63 13 0 09 Apr 2024
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits Yuko Kuroki Alberto Rumi Taira Tsuchiya Fabio Vitale Nicolò Cesa-Bianchi 73 7 0 24 Dec 2023
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits Qiwei Di Tao Jin Yue Wu Heyang Zhao Farzad Farnoud Quanquan Gu 64 13 0 02 Oct 2023
Borda Regret Minimization for Generalized Linear Dueling Bandits Yue Wu Tao Jin Hao Lou Farzad Farnoud Quanquan Gu 61 11 0 15 Mar 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or $K$ -wise Comparisons Banghua Zhu Jiantao Jiao Michael I. Jordan OffRL 77 201 0 26 Jan 2023
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes Chen Ye Wei Xiong Quanquan Gu Tong Zhang 126 31 0 12 Dec 2022
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions Jiafan He Dongruo Zhou Tong Zhang Quanquan Gu 87 47 0 13 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 760 12,835 0 04 Mar 2022
Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models Viktor Bengs Aadirupa Saha Eyke Hüllermeier 27 23 0 09 Feb 2022
Jointly Efficient and Optimal Algorithms for Logistic Bandits Louis Faury Marc Abeille Kwang-Sung Jun Clément Calauzènes 49 20 0 06 Jan 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability Aadirupa Saha A. Krishnamurthy 56 36 0 24 Nov 2021
Linear Contextual Bandits with Adversarial Corruptions Heyang Zhao Dongruo Zhou Quanquan Gu AAML 70 24 0 25 Oct 2021
Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks Qin Ding Cho-Jui Hsieh James Sharpnack AAML 46 33 0 05 Jun 2021
Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously Chung-Wei Lee Haipeng Luo Chen-Yu Wei Mengxiao Zhang Xiaojin Zhang 68 49 0 11 Feb 2021
Robust Policy Gradient against Strong Data Corruption Xuezhou Zhang Yiding Chen Xiaojin Zhu Wen Sun AAML 82 38 0 11 Feb 2021
Adversarial Dueling Bandits Aadirupa Saha Tomer Koren Yishay Mansour 60 27 0 27 Oct 2020
Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits Marc Abeille Louis Faury Clément Calauzènes 118 37 0 23 Oct 2020
The Ingredients of Real-World Robotic Reinforcement Learning Henry Zhu Justin Yu Abhishek Gupta Dhruv Shah Kristian Hartikainen Avi Singh Vikash Kumar Sergey Levine OffRL 100 176 0 27 Apr 2020
Improved Optimistic Algorithms for Logistic Bandits Louis Faury Marc Abeille Clément Calauzènes Olivier Fercoq 70 93 0 18 Feb 2020
Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning Tianhe Yu Deirdre Quillen Zhanpeng He Ryan Julian Avnish Narayan Hayden Shively Adithya Bellathur Karol Hausman Chelsea Finn Sergey Levine OffRL 224 1,160 0 24 Oct 2019
Stochastic Linear Optimization with Adversarial Corruption Yingkai Li Edmund Y. Lou Liren Shan AAML 43 42 0 04 Sep 2019
Better Algorithms for Stochastic Bandits with Adversarial Corruptions Anupam Gupta Tomer Koren Kunal Talwar AAML 92 152 0 22 Feb 2019
Stochastic bandits robust to adversarial corruptions Thodoris Lykouris Vahab Mirrokni R. Leme AAML 119 204 0 25 Mar 2018
Approximate Ranking from Pairwise Comparisons Reinhard Heckel Max Simchowitz Kannan Ramchandran Martin J. Wainwright 52 39 0 04 Jan 2018
An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits Yevgeny Seldin Gábor Lugosi 51 92 0 20 Feb 2017
An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits P. Auer Chao-Kai Chiang 53 111 0 27 May 2016
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm Junpei Komiyama Junya Honda Hiroshi Nakagawa 44 39 0 05 May 2016
Double Thompson Sampling for Dueling Bandits Huasen Wu Xin Liu 88 87 0 25 Apr 2016
A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits Pratik Gajane Tanguy Urvoy Fabrice Clérot 76 46 0 15 Jan 2016
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem Junpei Komiyama Junya Honda H. Kashima Hiroshi Nakagawa 141 92 0 08 Jun 2015
Copeland Dueling Bandits M. Zoghi Zohar Karnin Shimon Whiteson Maarten de Rijke 102 89 0 01 Jun 2015
Contextual Dueling Bandits Miroslav Dudík Katja Hofmann Robert Schapire Aleksandrs Slivkins M. Zoghi 105 124 0 23 Feb 2015
Sparse Dueling Bandits Kevin Jamieson S. Katariya Atul Deshpande Robert D. Nowak 193 64 0 31 Jan 2015
Reducing Dueling Bandits to Cardinal Bandits Nir Ailon Thorsten Joachims Zohar Karnin 125 139 0 14 May 2014
Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem M. Zoghi Shimon Whiteson Rémi Munos Maarten de Rijke 75 143 0 12 Dec 2013