Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1604.07101
Cited By
Double Thompson Sampling for Dueling Bandits
25 April 2016
Huasen Wu
Xin Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Double Thompson Sampling for Dueling Bandits"
15 / 15 papers shown
Title
Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents
Fanzeng Xia
Hao Liu
Yisong Yue
Tongxin Li
69
1
0
03 Jan 2025
Online Bandit Learning with Offline Preference Data for Improved RLHF
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
42
2
0
13 Jun 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
31
1
0
16 Apr 2024
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
26
17
0
14 Feb 2024
Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit
Tian Huang
Ke Li
Ke Li
31
1
0
23 Nov 2023
Borda Regret Minimization for Generalized Linear Dueling Bandits
Yue Wu
Tao Jin
Hao Lou
Farzad Farnoud
Quanquan Gu
34
11
0
15 Mar 2023
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
48
6
0
25 Oct 2022
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
Arpit Agarwal
R. Ghuge
V. Nagarajan
25
1
0
25 Sep 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
38
8
0
14 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Aadirupa Saha
A. Krishnamurthy
42
35
0
24 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Aadirupa Saha
Shubham Gupta
33
10
0
06 Nov 2021
Preference learning along multiple criteria: A game-theoretic perspective
Kush S. Bhatia
A. Pananjady
Peter L. Bartlett
Anca Dragan
Martin J. Wainwright
35
13
0
05 May 2021
KLUCB Approach to Copeland Bandits
Nischal Agrawal
P. Chaporkar
16
1
0
07 Feb 2019
Multi-dueling Bandits with Dependent Arms
Yanan Sui
Vincent Zhuang
J. W. Burdick
Yisong Yue
25
80
0
29 Apr 2017
Preferential Bayesian Optimization
Javier I. González
Zhenwen Dai
Andreas C. Damianou
Neil D. Lawrence
23
110
0
12 Apr 2017
1