ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.12306
  4. Cited By
Efficient and Optimal Algorithms for Contextual Dueling Bandits under
  Realizability

Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

24 November 2021
Aadirupa Saha
A. Krishnamurthy
ArXivPDFHTML

Papers citing "Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability"

18 / 18 papers shown
Title
Online Clustering of Dueling Bandits
Online Clustering of Dueling Bandits
Zhiyong Wang
Jiahang Sun
Mingze Kong
Jize Xie
Qinghua Hu
J. C. Lui
Zhongxiang Dai
83
0
0
04 Feb 2025
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
58
3
0
06 Oct 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
37
5
0
24 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
72
1
0
18 May 2024
Active Preference Learning for Ordering Items In- and Out-of-sample
Active Preference Learning for Ordering Items In- and Out-of-sample
Herman Bergström
Emil Carlsson
Devdatt Dubhashi
Fredrik D. Johansson
47
0
0
05 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
29
1
0
16 Apr 2024
Reinforcement Learning from Human Feedback with Active Queries
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
24
17
0
14 Feb 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
41
94
0
08 Jan 2024
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via
  Pessimism
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
Zihao Li
Zhuoran Yang
Mengdi Wang
OffRL
31
54
0
29 May 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons
Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
39
181
0
26 Jan 2023
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive
  Non-Stationary Dueling Bandits
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
46
6
0
25 Oct 2022
Dueling Convex Optimization with General Preferences
Dueling Convex Optimization with General Preferences
Aadirupa Saha
Tomer Koren
Yishay Mansour
22
2
0
27 Sep 2022
Flexible and Efficient Contextual Bandits with Heterogeneous Treatment
  Effect Oracles
Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles
Aldo G. Carranza
Sanath Kumar Krishnamurthy
Susan Athey
16
1
0
30 Mar 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online
  Learning from Preferences
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
36
8
0
14 Feb 2022
Dueling RL: Reinforcement Learning with Trajectory Preferences
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
33
81
0
08 Nov 2021
Optimal Dynamic Regret in Exp-Concave Online Learning
Optimal Dynamic Regret in Exp-Concave Online Learning
Dheeraj Baby
Yu-Xiang Wang
45
43
0
23 Apr 2021
1