Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

24 November 2021

Papers citing "Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability"

18 / 18 papers shown

Title
Online Clustering of Dueling Bandits Zhiyong Wang Jiahang Sun Mingze Kong Jize Xie Qinghua Hu J. C. Lui Zhongxiang Dai 83 0 0 04 Feb 2025
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF Zhaolin Gao Wenhao Zhan Jonathan D. Chang Gokul Swamy Kianté Brantley Jason D. Lee Wen Sun OffRL 58 3 0 06 Oct 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback Arun Verma Zhongxiang Dai Xiaoqiang Lin Patrick Jaillet K. H. Low 37 5 0 24 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis Qining Zhang Honghao Wei Lei Ying OffRL 67 1 0 11 Jun 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback Ruitao Chen Liwei Wang 72 1 0 18 May 2024
Active Preference Learning for Ordering Items In- and Out-of-sample Herman Bergström Emil Carlsson Devdatt Dubhashi Fredrik D. Johansson 47 0 0 05 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback Qiwei Di Jiafan He Quanquan Gu 29 1 0 16 Apr 2024
Reinforcement Learning from Human Feedback with Active Queries Kaixuan Ji Jiafan He Quanquan Gu 24 17 0 14 Feb 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF Banghua Zhu Michael I. Jordan Jiantao Jiao 31 25 0 29 Jan 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback Gokul Swamy Christoph Dann Rahul Kidambi Zhiwei Steven Wu Alekh Agarwal OffRL 41 94 0 08 Jan 2024
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism Zihao Li Zhuoran Yang Mengdi Wang OffRL 31 54 0 29 May 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or $K$ -wise Comparisons Banghua Zhu Jiantao Jiao Michael I. Jordan OffRL 39 181 0 26 Jan 2023
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits Thomas Kleine Buening Aadirupa Saha 46 6 0 25 Oct 2022
Dueling Convex Optimization with General Preferences Aadirupa Saha Tomer Koren Yishay Mansour 22 2 0 27 Sep 2022
Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles Aldo G. Carranza Sanath Kumar Krishnamurthy Susan Athey 16 1 0 30 Mar 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences Aadirupa Saha Pierre Gaillard 36 8 0 14 Feb 2022
Dueling RL: Reinforcement Learning with Trajectory Preferences Aldo Pacchiano Aadirupa Saha Jonathan Lee 33 81 0 08 Nov 2021
Optimal Dynamic Regret in Exp-Concave Online Learning Dheeraj Baby Yu-Xiang Wang 45 43 0 23 Apr 2021