PAC Battling Bandits in the Plackett-Luce Model

12 August 2018

Papers citing "PAC Battling Bandits in the Plackett-Luce Model"

9 / 9 papers shown

Title
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback Arun Verma Zhongxiang Dai Xiaoqiang Lin P. Jaillet K. H. Low 37 5 0 24 Jul 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback Ruitao Chen Liwei Wang 72 1 0 18 May 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF Banghua Zhu Michael I. Jordan Jiantao Jiao 31 25 0 29 Jan 2024
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits Thomas Kleine Buening Aadirupa Saha 38 6 0 25 Oct 2022
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits Suprovat Ghoshal Aadirupa Saha 17 11 0 23 Feb 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences Aadirupa Saha Pierre Gaillard 33 8 0 14 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability Aadirupa Saha A. Krishnamurthy 21 35 0 24 Nov 2021
Dueling RL: Reinforcement Learning with Trajectory Preferences Aldo Pacchiano Aadirupa Saha Jonathan Lee 33 81 0 08 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits Aadirupa Saha Shubham Gupta 25 10 0 06 Nov 2021