ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.06362
  4. Cited By
Contextual Dueling Bandits

Contextual Dueling Bandits

23 February 2015
Miroslav Dudík
Katja Hofmann
Robert Schapire
Aleksandrs Slivkins
M. Zoghi
ArXivPDFHTML

Papers citing "Contextual Dueling Bandits"

33 / 33 papers shown
Title
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Shang Lu
Shuji Kijima
42
0
0
08 May 2025
Toward Efficient Exploration by Large Language Model Agents
Toward Efficient Exploration by Large Language Model Agents
Dilip Arumugam
Thomas L. Griffiths
LLMAG
94
1
0
29 Apr 2025
Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
Emiliano Penaloza
Tianyue H. Zhan
Laurent Charlin
Mateo Espinosa Zarlenga
51
0
0
25 Apr 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback
Reinforcement Learning from Multi-level and Episodic Human Feedback
Muhammad Qasim Elahi
Somtochukwu Oguchienti
Maheed H. Ahmed
Mahsa Ghasemi
OffRL
55
0
0
20 Apr 2025
Cost-Aware Optimal Pairwise Pure Exploration
Di Wu
Chengshuai Shi
Ruida Zhou
Cong Shen
41
0
0
10 Mar 2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
50
0
0
09 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
81
3
0
06 Oct 2024
Biased Dueling Bandits with Stochastic Delayed Feedback
Biased Dueling Bandits with Stochastic Delayed Feedback
Bongsoo Yi
Yue Kang
Yao Li
38
1
0
26 Aug 2024
Online Bandit Learning with Offline Preference Data for Improved RLHF
Online Bandit Learning with Offline Preference Data for Improved RLHF
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
42
2
0
13 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
Active Preference Learning for Ordering Items In- and Out-of-sample
Active Preference Learning for Ordering Items In- and Out-of-sample
Herman Bergström
Emil Carlsson
Devdatt Dubhashi
Fredrik D. Johansson
49
0
0
05 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
31
1
0
16 Apr 2024
Reinforcement Learning from Human Feedback with Active Queries
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
26
17
0
14 Feb 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
51
96
0
08 Jan 2024
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
  Model
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang
Jian Tao
Jiafei Lyu
Chunjiang Ge
Jiaxin Chen
Qimai Li
Weihan Shen
Xiaolong Zhu
Xiu Li
EGVM
23
89
0
22 Nov 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for
  LLM Alignment
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Tianhao Wu
Banghua Zhu
Ruoyu Zhang
Zhaojin Wen
Kannan Ramchandran
Jiantao Jiao
44
55
0
30 Sep 2023
Borda Regret Minimization for Generalized Linear Dueling Bandits
Borda Regret Minimization for Generalized Linear Dueling Bandits
Yue Wu
Tao Jin
Hao Lou
Farzad Farnoud
Quanquan Gu
34
11
0
15 Mar 2023
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive
  Non-Stationary Dueling Bandits
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
48
6
0
25 Oct 2022
Dueling Convex Optimization with General Preferences
Dueling Convex Optimization with General Preferences
Aadirupa Saha
Tomer Koren
Yishay Mansour
30
2
0
27 Sep 2022
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit
  Problem
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
Arpit Agarwal
R. Ghuge
V. Nagarajan
25
1
0
25 Sep 2022
Flexible and Efficient Contextual Bandits with Heterogeneous Treatment
  Effect Oracles
Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles
Aldo G. Carranza
Sanath Kumar Krishnamurthy
Susan Athey
21
1
0
30 Mar 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online
  Learning from Preferences
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
38
8
0
14 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under
  Realizability
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Aadirupa Saha
A. Krishnamurthy
42
35
0
24 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary
  Dueling Bandits
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Aadirupa Saha
Shubham Gupta
33
10
0
06 Nov 2021
Preference learning along multiple criteria: A game-theoretic
  perspective
Preference learning along multiple criteria: A game-theoretic perspective
Kush S. Bhatia
A. Pananjady
Peter L. Bartlett
Anca Dragan
Martin J. Wainwright
35
13
0
05 May 2021
Preference-based Reinforcement Learning with Finite-Time Guarantees
Preference-based Reinforcement Learning with Finite-Time Guarantees
Yichong Xu
Ruosong Wang
Lin F. Yang
Aarti Singh
A. Dubrawski
36
53
0
16 Jun 2020
Re-evaluating Evaluation
Re-evaluating Evaluation
David Balduzzi
K. Tuyls
Julien Perolat
T. Graepel
MoMe
30
97
0
07 Jun 2018
Multi-dueling Bandits with Dependent Arms
Multi-dueling Bandits with Dependent Arms
Yanan Sui
Vincent Zhuang
J. W. Burdick
Yisong Yue
25
80
0
29 Apr 2017
Preferential Bayesian Optimization
Preferential Bayesian Optimization
Javier I. González
Zhenwen Dai
Andreas C. Damianou
Neil D. Lawrence
23
110
0
12 Apr 2017
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm,
  and Computationally Efficient Algorithm
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm
Junpei Komiyama
Junya Honda
Hiroshi Nakagawa
14
39
0
05 May 2016
Double Thompson Sampling for Dueling Bandits
Double Thompson Sampling for Dueling Bandits
Huasen Wu
Xin Liu
22
87
0
25 Apr 2016
Copeland Dueling Bandits
Copeland Dueling Bandits
M. Zoghi
Zohar Karnin
Shimon Whiteson
Maarten de Rijke
23
89
0
01 Jun 2015
1