Contextual Dueling Bandits

23 February 2015

Papers citing "Contextual Dueling Bandits"

33 / 33 papers shown

Title
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits Shang Lu Shuji Kijima 42 0 0 08 May 2025
Toward Efficient Exploration by Large Language Model Agents Dilip Arumugam Thomas L. Griffiths LLMAG 94 1 0 29 Apr 2025
Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization Emiliano Penaloza Tianyue H. Zhan Laurent Charlin Mateo Espinosa Zarlenga 51 0 0 25 Apr 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback Muhammad Qasim Elahi Somtochukwu Oguchienti Maheed H. Ahmed Mahsa Ghasemi OffRL 55 0 0 20 Apr 2025
Cost-Aware Optimal Pairwise Pure Exploration Di Wu Chengshuai Shi Ruida Zhou Cong Shen 41 0 0 10 Mar 2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability Qingyue Zhao Kaixuan Ji Heyang Zhao Tong Zhang Q. Gu OffRL 50 0 0 09 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF Heyang Zhao Chenlu Ye Quanquan Gu Tong Zhang OffRL 57 3 0 07 Nov 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF Zhaolin Gao Wenhao Zhan Jonathan D. Chang Gokul Swamy Kianté Brantley Jason D. Lee Wen Sun OffRL 81 3 0 06 Oct 2024
Biased Dueling Bandits with Stochastic Delayed Feedback Bongsoo Yi Yue Kang Yao Li 38 1 0 26 Aug 2024
Online Bandit Learning with Offline Preference Data for Improved RLHF Akhil Agnihotri Rahul Jain Deepak Ramachandran Zheng Wen OffRL 42 2 0 13 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis Qining Zhang Honghao Wei Lei Ying OffRL 67 1 0 11 Jun 2024
Active Preference Learning for Ordering Items In- and Out-of-sample Herman Bergström Emil Carlsson Devdatt Dubhashi Fredrik D. Johansson 49 0 0 05 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback Qiwei Di Jiafan He Quanquan Gu 31 1 0 16 Apr 2024
Reinforcement Learning from Human Feedback with Active Queries Kaixuan Ji Jiafan He Quanquan Gu 26 17 0 14 Feb 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback Gokul Swamy Christoph Dann Rahul Kidambi Zhiwei Steven Wu Alekh Agarwal OffRL 51 96 0 08 Jan 2024
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model Kai Yang Jian Tao Jiafei Lyu Chunjiang Ge Jiaxin Chen Qimai Li Weihan Shen Xiaolong Zhu Xiu Li EGVM 23 89 0 22 Nov 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment Tianhao Wu Banghua Zhu Ruoyu Zhang Zhaojin Wen Kannan Ramchandran Jiantao Jiao 44 55 0 30 Sep 2023
Borda Regret Minimization for Generalized Linear Dueling Bandits Yue Wu Tao Jin Hao Lou Farzad Farnoud Quanquan Gu 34 11 0 15 Mar 2023
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits Thomas Kleine Buening Aadirupa Saha 48 6 0 25 Oct 2022
Dueling Convex Optimization with General Preferences Aadirupa Saha Tomer Koren Yishay Mansour 30 2 0 27 Sep 2022
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem Arpit Agarwal R. Ghuge V. Nagarajan 25 1 0 25 Sep 2022
Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles Aldo G. Carranza Sanath Kumar Krishnamurthy Susan Athey 21 1 0 30 Mar 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences Aadirupa Saha Pierre Gaillard 38 8 0 14 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability Aadirupa Saha A. Krishnamurthy 42 35 0 24 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits Aadirupa Saha Shubham Gupta 33 10 0 06 Nov 2021
Preference learning along multiple criteria: A game-theoretic perspective Kush S. Bhatia A. Pananjady Peter L. Bartlett Anca Dragan Martin J. Wainwright 35 13 0 05 May 2021
Preference-based Reinforcement Learning with Finite-Time Guarantees Yichong Xu Ruosong Wang Lin F. Yang Aarti Singh A. Dubrawski 36 53 0 16 Jun 2020
Re-evaluating Evaluation David Balduzzi K. Tuyls Julien Perolat T. Graepel MoMe 30 97 0 07 Jun 2018
Multi-dueling Bandits with Dependent Arms Yanan Sui Vincent Zhuang J. W. Burdick Yisong Yue 25 80 0 29 Apr 2017
Preferential Bayesian Optimization Javier I. González Zhenwen Dai Andreas C. Damianou Neil D. Lawrence 23 110 0 12 Apr 2017
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm Junpei Komiyama Junya Honda Hiroshi Nakagawa 14 39 0 05 May 2016
Double Thompson Sampling for Dueling Bandits Huasen Wu Xin Liu 22 87 0 25 Apr 2016
Copeland Dueling Bandits M. Zoghi Zohar Karnin Shimon Whiteson Maarten de Rijke 23 89 0 01 Jun 2015