Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.11795
Cited By
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits
23 February 2022
Suprovat Ghoshal
Aadirupa Saha
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits"
11 / 11 papers shown
Title
Online Clustering of Dueling Bandits
Zhiyong Wang
Jiahang Sun
Mingze Kong
Jize Xie
Qinghua Hu
J. C. Lui
Zhongxiang Dai
83
0
0
04 Feb 2025
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
P. Jaillet
K. H. Low
37
5
0
24 Jul 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
72
1
0
18 May 2024
DP-Dueling: Learning from Preference Feedback without Compromising User Privacy
Aadirupa Saha
Hilal Asi
36
1
0
22 Mar 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources
Rohan Deb
Aadirupa Saha
25
0
0
28 Dec 2023
Faster Convergence with Multiway Preferences
Aadirupa Saha
Vitaly Feldman
Tomer Koren
Yishay Mansour
21
1
0
19 Dec 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
K
K
K
-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
28
181
0
26 Jan 2023
One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits
Pierre Gaillard
Aadirupa Saha
Soham Dan
28
3
0
26 Oct 2022
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
38
6
0
25 Oct 2022
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
33
81
0
08 Nov 2021
1