Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.06362
Cited By
Contextual Dueling Bandits
23 February 2015
Miroslav Dudík
Katja Hofmann
Robert Schapire
Aleksandrs Slivkins
M. Zoghi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Contextual Dueling Bandits"
33 / 33 papers shown
Title
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Shang Lu
Shuji Kijima
42
0
0
08 May 2025
Toward Efficient Exploration by Large Language Model Agents
Dilip Arumugam
Thomas L. Griffiths
LLMAG
94
1
0
29 Apr 2025
Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
Emiliano Penaloza
Tianyue H. Zhan
Laurent Charlin
Mateo Espinosa Zarlenga
51
0
0
25 Apr 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback
Muhammad Qasim Elahi
Somtochukwu Oguchienti
Maheed H. Ahmed
Mahsa Ghasemi
OffRL
55
0
0
20 Apr 2025
Cost-Aware Optimal Pairwise Pure Exploration
Di Wu
Chengshuai Shi
Ruida Zhou
Cong Shen
41
0
0
10 Mar 2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
50
0
0
09 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
81
3
0
06 Oct 2024
Biased Dueling Bandits with Stochastic Delayed Feedback
Bongsoo Yi
Yue Kang
Yao Li
38
1
0
26 Aug 2024
Online Bandit Learning with Offline Preference Data for Improved RLHF
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
42
2
0
13 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
Active Preference Learning for Ordering Items In- and Out-of-sample
Herman Bergström
Emil Carlsson
Devdatt Dubhashi
Fredrik D. Johansson
49
0
0
05 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
31
1
0
16 Apr 2024
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
26
17
0
14 Feb 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
51
96
0
08 Jan 2024
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang
Jian Tao
Jiafei Lyu
Chunjiang Ge
Jiaxin Chen
Qimai Li
Weihan Shen
Xiaolong Zhu
Xiu Li
EGVM
23
89
0
22 Nov 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Tianhao Wu
Banghua Zhu
Ruoyu Zhang
Zhaojin Wen
Kannan Ramchandran
Jiantao Jiao
44
55
0
30 Sep 2023
Borda Regret Minimization for Generalized Linear Dueling Bandits
Yue Wu
Tao Jin
Hao Lou
Farzad Farnoud
Quanquan Gu
34
11
0
15 Mar 2023
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
48
6
0
25 Oct 2022
Dueling Convex Optimization with General Preferences
Aadirupa Saha
Tomer Koren
Yishay Mansour
30
2
0
27 Sep 2022
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
Arpit Agarwal
R. Ghuge
V. Nagarajan
25
1
0
25 Sep 2022
Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles
Aldo G. Carranza
Sanath Kumar Krishnamurthy
Susan Athey
21
1
0
30 Mar 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
38
8
0
14 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Aadirupa Saha
A. Krishnamurthy
42
35
0
24 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Aadirupa Saha
Shubham Gupta
33
10
0
06 Nov 2021
Preference learning along multiple criteria: A game-theoretic perspective
Kush S. Bhatia
A. Pananjady
Peter L. Bartlett
Anca Dragan
Martin J. Wainwright
35
13
0
05 May 2021
Preference-based Reinforcement Learning with Finite-Time Guarantees
Yichong Xu
Ruosong Wang
Lin F. Yang
Aarti Singh
A. Dubrawski
36
53
0
16 Jun 2020
Re-evaluating Evaluation
David Balduzzi
K. Tuyls
Julien Perolat
T. Graepel
MoMe
30
97
0
07 Jun 2018
Multi-dueling Bandits with Dependent Arms
Yanan Sui
Vincent Zhuang
J. W. Burdick
Yisong Yue
25
80
0
29 Apr 2017
Preferential Bayesian Optimization
Javier I. González
Zhenwen Dai
Andreas C. Damianou
Neil D. Lawrence
23
110
0
12 Apr 2017
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm
Junpei Komiyama
Junya Honda
Hiroshi Nakagawa
14
39
0
05 May 2016
Double Thompson Sampling for Dueling Bandits
Huasen Wu
Xin Liu
22
87
0
25 Apr 2016
Copeland Dueling Bandits
M. Zoghi
Zohar Karnin
Shimon Whiteson
Maarten de Rijke
23
89
0
01 Jun 2015
1