ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.11795
  4. Cited By
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank
  Preference Bandits

Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits

23 February 2022
Suprovat Ghoshal
Aadirupa Saha
ArXivPDFHTML

Papers citing "Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits"

11 / 11 papers shown
Title
Online Clustering of Dueling Bandits
Online Clustering of Dueling Bandits
Zhiyong Wang
Jiahang Sun
Mingze Kong
Jize Xie
Qinghua Hu
J. C. Lui
Zhongxiang Dai
83
0
0
04 Feb 2025
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
P. Jaillet
K. H. Low
37
5
0
24 Jul 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
72
1
0
18 May 2024
DP-Dueling: Learning from Preference Feedback without Compromising User
  Privacy
DP-Dueling: Learning from Preference Feedback without Compromising User Privacy
Aadirupa Saha
Hilal Asi
36
1
0
22 Mar 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
Think Before You Duel: Understanding Complexities of Preference Learning
  under Constrained Resources
Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources
Rohan Deb
Aadirupa Saha
30
0
0
28 Dec 2023
Faster Convergence with Multiway Preferences
Faster Convergence with Multiway Preferences
Aadirupa Saha
Vitaly Feldman
Tomer Koren
Yishay Mansour
23
1
0
19 Dec 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons
Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
30
181
0
26 Jan 2023
One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret
  Guarantees in Sleeping Bandits
One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits
Pierre Gaillard
Aadirupa Saha
Soham Dan
28
3
0
26 Oct 2022
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive
  Non-Stationary Dueling Bandits
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
40
6
0
25 Oct 2022
Dueling RL: Reinforcement Learning with Trajectory Preferences
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
33
81
0
08 Nov 2021
1