ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.08816
  4. Cited By
Borda Regret Minimization for Generalized Linear Dueling Bandits

Borda Regret Minimization for Generalized Linear Dueling Bandits

15 March 2023
Yue Wu
Tao Jin
Hao Lou
Farzad Farnoud
Quanquan Gu
ArXivPDFHTML

Papers citing "Borda Regret Minimization for Generalized Linear Dueling Bandits"

8 / 8 papers shown
Title
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Yifan Zhang
Ge Zhang
Yue Wu
Kangping Xu
Quanquan Gu
48
3
0
03 Oct 2024
Active Preference Learning for Ordering Items In- and Out-of-sample
Active Preference Learning for Ordering Items In- and Out-of-sample
Herman Bergström
Emil Carlsson
Devdatt Dubhashi
Fredrik D. Johansson
47
0
0
05 May 2024
Self-Play Preference Optimization for Language Model Alignment
Self-Play Preference Optimization for Language Model Alignment
Yue Wu
Zhiqing Sun
Huizhuo Yuan
Kaixuan Ji
Yiming Yang
Quanquan Gu
33
113
0
01 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
29
1
0
16 Apr 2024
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Xuheng Li
Heyang Zhao
Quanquan Gu
42
9
0
09 Apr 2024
Reinforcement Learning from Human Feedback with Active Queries
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
24
17
0
14 Feb 2024
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits
Qiwei Di
Tao Jin
Yue Wu
Heyang Zhao
Farzad Farnoud
Quanquan Gu
18
11
0
02 Oct 2023
1