ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.00968
  4. Cited By
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

2 October 2023
Qiwei Di
Tao Jin
Yue Wu
Heyang Zhao
Farzad Farnoud
Quanquan Gu
ArXivPDFHTML

Papers citing "Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits"

15 / 15 papers shown
Title
Active Human Feedback Collection via Neural Contextual Dueling Bandits
Active Human Feedback Collection via Neural Contextual Dueling Bandits
Arun Verma
Xiaoqiang Lin
Zhongxiang Dai
Daniela Rus
Bryan Kian Hsiang Low
37
0
0
16 Apr 2025
Online Clustering of Dueling Bandits
Online Clustering of Dueling Bandits
Zhiyong Wang
Jiahang Sun
Mingze Kong
Jize Xie
Qinghua Hu
J. C. Lui
Zhongxiang Dai
83
0
0
04 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
Optimal Design for Reward Modeling in RLHF
Optimal Design for Reward Modeling in RLHF
Antoine Scheid
Etienne Boursier
Alain Durmus
Michael I. Jordan
Pierre Ménard
Eric Moulines
Michal Valko
OffRL
53
6
0
22 Oct 2024
How Does Variance Shape the Regret in Contextual Bandits?
How Does Variance Shape the Regret in Contextual Bandits?
Zeyu Jia
Jian Qian
Alexander Rakhlin
Chen-Yu Wei
35
4
0
16 Oct 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
37
5
0
24 Jul 2024
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust
  Dueling
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling
Yuwei Cheng
Fan Yao
Xuefeng Liu
Haifeng Xu
46
1
0
18 May 2024
Active Preference Learning for Ordering Items In- and Out-of-sample
Active Preference Learning for Ordering Items In- and Out-of-sample
Herman Bergström
Emil Carlsson
Devdatt Dubhashi
Fredrik D. Johansson
47
0
0
05 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
29
1
0
16 Apr 2024
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Xuheng Li
Heyang Zhao
Quanquan Gu
42
9
0
09 Apr 2024
Reinforcement Learning from Human Feedback with Active Queries
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
24
17
0
14 Feb 2024
Borda Regret Minimization for Generalized Linear Dueling Bandits
Borda Regret Minimization for Generalized Linear Dueling Bandits
Yue Wu
Tao Jin
Hao Lou
Farzad Farnoud
Quanquan Gu
34
11
0
15 Mar 2023
Computationally Efficient Horizon-Free Reinforcement Learning for Linear
  Mixture MDPs
Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs
Dongruo Zhou
Quanquan Gu
81
43
0
23 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
366
12,003
0
04 Mar 2022
Improved Variance-Aware Confidence Sets for Linear Bandits and Linear
  Mixture MDP
Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP
Zihan Zhang
Jiaqi Yang
Xiangyang Ji
S. Du
71
36
0
29 Jan 2021
1