ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.04008
  4. Cited By
PAC Battling Bandits in the Plackett-Luce Model

PAC Battling Bandits in the Plackett-Luce Model

12 August 2018
Aadirupa Saha
Aditya Gopalan
ArXivPDFHTML

Papers citing "PAC Battling Bandits in the Plackett-Luce Model"

9 / 9 papers shown
Title
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
P. Jaillet
K. H. Low
37
5
0
24 Jul 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
72
1
0
18 May 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive
  Non-Stationary Dueling Bandits
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
38
6
0
25 Oct 2022
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank
  Preference Bandits
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits
Suprovat Ghoshal
Aadirupa Saha
17
11
0
23 Feb 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online
  Learning from Preferences
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
33
8
0
14 Feb 2022
Efficient and Optimal Algorithms for Contextual Dueling Bandits under
  Realizability
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Aadirupa Saha
A. Krishnamurthy
21
35
0
24 Nov 2021
Dueling RL: Reinforcement Learning with Trajectory Preferences
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
33
81
0
08 Nov 2021
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary
  Dueling Bandits
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Aadirupa Saha
Shubham Gupta
25
10
0
06 Nov 2021
1