Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.07530
Cited By
Improved Optimistic Algorithms for Logistic Bandits
18 February 2020
Louis Faury
Marc Abeille
Clément Calauzènes
Olivier Fercoq
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improved Optimistic Algorithms for Logistic Bandits"
27 / 27 papers shown
Title
Neural Logistic Bandits
Seoungbin Bae
Dabeen Lee
213
0
0
04 May 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback
Muhammad Qasim Elahi
Somtochukwu Oguchienti
Maheed H. Ahmed
Mahsa Ghasemi
OffRL
55
0
0
20 Apr 2025
Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation
Long-Fei Li
Yu-Jie Zhang
Peng Zhao
Zhi-Hua Zhou
103
4
0
17 Jan 2025
Near Optimal Pure Exploration in Logistic Bandits
Eduardo Ochoa Rivera
Ambuj Tewari
30
0
0
28 Oct 2024
Advances in Preference-based Reinforcement Learning: A Review
Youssef Abdelkareem
Shady Shehata
Fakhri Karray
OffRL
51
9
0
21 Aug 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
37
5
0
24 Jul 2024
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
Junghyun Lee
Se-Young Yun
Kwang-Sung Jun
33
4
0
19 Jul 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
75
1
0
18 May 2024
Active Preference Learning for Ordering Items In- and Out-of-sample
Herman Bergström
Emil Carlsson
Devdatt Dubhashi
Fredrik D. Johansson
47
0
0
05 May 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Li Zhao
Xinle Cheng
Jiang Bian
Di He
Jiang Bian
Liwei Wang
60
57
0
29 Apr 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
29
1
0
16 Apr 2024
Generalized Linear Bandits with Limited Adaptivity
Ayush Sawarni
Nirjhar Das
Siddharth Barman
Gaurav Sinha
42
3
0
10 Apr 2024
Horizon-Free Regret for Linear Markov Decision Processes
Zihan Zhang
Jason D. Lee
Yuxin Chen
Simon S. Du
33
3
0
15 Mar 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
Time-Uniform Confidence Spheres for Means of Random Vectors
Ben Chugg
Hongjian Wang
Aaditya Ramdas
51
5
0
14 Nov 2023
Borda Regret Minimization for Generalized Linear Dueling Bandits
Yue Wu
Tao Jin
Hao Lou
Farzad Farnoud
Quanquan Gu
34
11
0
15 Mar 2023
Revisiting Weighted Strategy for Non-stationary Parametric Bandits
Jing Wang
Peng Zhao
Zhihong Zhou
38
5
0
05 Mar 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
K
K
K
-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
42
183
0
26 Jan 2023
Risk-aware linear bandits with convex loss
Patrick Saux
Odalric-Ambrym Maillard
24
2
0
15 Sep 2022
Contextual Bandits with Knapsacks for a Conversion Model
Zerui Li
Gilles Stoltz
66
3
0
01 Jun 2022
Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits
Gergely Neu
Julia Olkhovskaya
Matteo Papini
Ludovic Schwartz
33
16
0
27 May 2022
An Experimental Design Approach for Regret Minimization in Logistic Bandits
Blake Mason
Kwang-Sung Jun
Lalit P. Jain
26
10
0
04 Feb 2022
Jointly Efficient and Optimal Algorithms for Logistic Bandits
Louis Faury
Marc Abeille
Kwang-Sung Jun
Clément Calauzènes
30
19
0
06 Jan 2022
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
33
82
0
08 Nov 2021
Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification
James A. Grant
David S. Leslie
44
3
0
29 Sep 2021
UCB-based Algorithms for Multinomial Logistic Regression Bandits
Sanae Amani
Christos Thrampoulidis
34
10
0
21 Mar 2021
Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP
Zihan Zhang
Jiaqi Yang
Xiangyang Ji
S. Du
71
36
0
29 Jan 2021
1