Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.05154
Cited By
v1
v2
v3 (latest)
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
13 November 2018
Branislav Kveton
Csaba Szepesvári
Sharan Vaswani
Zheng Wen
Mohammad Ghavamzadeh
Tor Lattimore
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits"
50 / 50 papers shown
Title
BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
Yunlong Hou
Fengzhuo Zhang
Cunxiao Du
Xuan Zhang
Jiachun Pan
Tianyu Pang
Chao Du
Vincent Y. F. Tan
Zhuoran Yang
OffRL
122
1
0
21 May 2025
QuACK: A Multipurpose Queuing Algorithm for Cooperative
k
k
k
-Armed Bandits
Benjamin Howson
Sarah Filippi
Ciara Pike-Burke
72
1
0
31 Oct 2024
Random Latent Exploration for Deep Reinforcement Learning
Srinath Mahankali
Zhang-Wei Hong
Ayush Sekhari
Alexander Rakhlin
Pulkit Agrawal
264
3
0
18 Jul 2024
Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits
Ziyi Huang
Henry Lam
Haofeng Zhang
89
0
0
20 Jun 2024
Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility
Yuantong Li
Guang Cheng
Xiaowu Dai
72
1
0
04 Jun 2024
Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models
Masatoshi Uehara
Yulai Zhao
Ehsan Hajiramezanali
Gabriele Scalia
Gökçen Eraslan
Avantika Lal
Sergey Levine
Tommaso Biancalani
133
16
0
30 May 2024
Feedback Efficient Online Fine-Tuning of Diffusion Models
Masatoshi Uehara
Yulai Zhao
Kevin Black
Ehsan Hajiramezanali
Gabriele Scalia
N. Diamant
Alex Tseng
Sergey Levine
Tommaso Biancalani
121
28
0
26 Feb 2024
Randomized Confidence Bounds for Stochastic Partial Monitoring
M. Heuillet
Ola Ahmad
Audrey Durand
96
1
0
07 Feb 2024
Zero-Inflated Bandits
Haoyu Wei
Runzhe Wan
Lei Shi
Rui Song
111
0
0
25 Dec 2023
Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem
Elena Gal
Shaun Singh
Aldo Pacchiano
Benjamin Walker
Terry Lyons
Jakob N. Foerster
FaML
67
0
0
15 Aug 2023
Differential Good Arm Identification
Yun-Da Tsai
Tzu-Hsien Tsai
Shou-De Lin
24
6
0
13 Mar 2023
A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms
Dorian Baudry
Kazuya Suzuki
Junya Honda
60
5
0
10 Mar 2023
Multiplier Bootstrap-based Exploration
Runzhe Wan
Haoyu Wei
Branislav Kveton
R. Song
52
3
0
03 Feb 2023
Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
A. Naumov
Mark Rowland
Michal Valko
Pierre Menard
105
10
0
28 Sep 2022
A Nonparametric Contextual Bandit with Arm-level Eligibility Control for Customer Service Routing
Ruofeng Wen
Wenjun Zeng
Yi Liu
59
0
0
08 Sep 2022
An Analysis of Ensemble Sampling
Chao Qin
Zheng Wen
Xiuyuan Lu
Benjamin Van Roy
129
22
0
02 Mar 2022
Residual Bootstrap Exploration for Stochastic Linear Bandit
Shuang Wu
ChiHua Wang
Yuantong Li
Guang Cheng
81
8
0
23 Feb 2022
Learning Neural Contextual Bandits Through Perturbed Rewards
Yiling Jia
Weitong Zhang
Dongruo Zhou
Quanquan Gu
Hongning Wang
141
14
0
24 Jan 2022
Neural Pseudo-Label Optimism for the Bank Loan Problem
Aldo Pacchiano
Shaun Singh
Edward Chou
Alexander C. Berg
Jakob N. Foerster
49
7
0
03 Dec 2021
From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits
Dorian Baudry
Patrick Saux
Odalric-Ambrym Maillard
65
7
0
18 Nov 2021
Maillard Sampling: Boltzmann Exploration Done Optimally
Jieming Bian
Kwang-Sung Jun
62
13
0
05 Nov 2021
Debiasing Samples from Online Learning Using Bootstrap
Ningyuan Chen
Xuefeng Gao
Yi Xiong
OffRL
OnRL
64
4
0
31 Jul 2021
GuideBoot: Guided Bootstrap for Deep Contextual Bandits
Feiyang Pan
Haoming Li
Xiang Ao
Wei Wang
Yanrong Kang
Ao Tan
Qing He
40
0
0
18 Jul 2021
Random Effect Bandits
Rong Zhu
Branislav Kveton
46
4
0
23 Jun 2021
On Limited-Memory Subsampling Strategies for Bandits
Dorian Baudry
Yoan Russac
Olivier Cappé
117
8
0
21 Jun 2021
Multi-armed Bandit Requiring Monotone Arm Sequences
Ningyuan Chen
144
11
0
07 Jun 2021
Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks
Rong Zhu
Mattia Rigotti
65
7
0
10 May 2021
CORe: Capitalizing On Rewards in Bandit Exploration
Nan Wang
Branislav Kveton
Maryam Karimzadehgan
22
2
0
07 Mar 2021
Policy Optimization as Online Learning with Mediator Feedback
Alberto Maria Metelli
Matteo Papini
P. DÓro
Marcello Restelli
OffRL
58
10
0
15 Dec 2020
DORB: Dynamically Optimizing Multiple Rewards with Bandits
Ramakanth Pasunuru
Han Guo
Joey Tianyi Zhou
OffRL
72
7
0
15 Nov 2020
Sub-sampling for Efficient Non-Parametric Bandit Exploration
Dorian Baudry
E. Kaufmann
Odalric-Ambrym Maillard
55
14
0
27 Oct 2020
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration
Priyank Agrawal
Jinglin Chen
Nan Jiang
117
21
0
23 Oct 2020
Learning to Rank under Multinomial Logit Choice
James A. Grant
David S. Leslie
41
0
0
07 Sep 2020
Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation
Ilya Kostrikov
Ofir Nachum
OffRL
70
31
0
27 Jul 2020
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems
Tong Yu
Branislav Kveton
Zheng Wen
Ruiyi Zhang
Ole J. Mengshoel
TDI
33
2
0
09 Jul 2020
Towards Tractable Optimism in Model-Based Reinforcement Learning
Aldo Pacchiano
Philip J. Ball
Jack Parker-Holder
K. Choromanski
Stephen J. Roberts
OffRL
61
12
0
21 Jun 2020
BanditPAM: Almost Linear Time
k
k
k
-Medoids Clustering via Multi-Armed Bandits
Mo Tiwari
Martin Jinye Zhang
James Mayclin
Sebastian Thrun
Chris Piech
Ilan Shomorony
56
11
0
11 Jun 2020
Meta-Learning Bandit Policies by Gradient Ascent
Branislav Kveton
Martin Mladenov
Chih-Wei Hsu
Manzil Zaheer
Csaba Szepesvári
Craig Boutilier
76
9
0
09 Jun 2020
Differentiable Linear Bandit Algorithm
Kaige Yang
Laura Toni
67
6
0
04 Jun 2020
Self-Supervised Contextual Bandits in Computer Vision
A. Deshmukh
Abhimanu Kumar
Levi Boyles
Denis Xavier Charles
Eren Manavoglu
Ürün Dogan
SSL
55
3
0
18 Mar 2020
Residual Bootstrap Exploration for Bandit Algorithms
ChiHua Wang
Yang Yu
Botao Hao
Guang Cheng
66
16
0
19 Feb 2020
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
Dylan J. Foster
Alexander Rakhlin
373
213
0
12 Feb 2020
Old Dog Learns New Tricks: Randomized UCB for Bandit Problems
Sharan Vaswani
Abbas Mehrabian
A. Durand
Branislav Kveton
80
28
0
11 Oct 2019
Thompson Sampling with Approximate Inference
My Phan
Yasin Abbasi-Yadkori
Justin Domke
64
29
0
14 Aug 2019
Randomized Exploration in Generalized Linear Bandits
Branislav Kveton
Manzil Zaheer
Csaba Szepesvári
Lihong Li
Mohammad Ghavamzadeh
Craig Boutilier
104
98
0
21 Jun 2019
Bootstrapping Upper Confidence Bound
Botao Hao
Yasin Abbasi-Yadkori
Zheng Wen
Guang Cheng
116
55
0
12 Jun 2019
Perturbed-History Exploration in Stochastic Linear Bandits
Branislav Kveton
Csaba Szepesvári
Mohammad Ghavamzadeh
Craig Boutilier
67
43
0
21 Mar 2019
On Applications of Bootstrap in Continuous Space Reinforcement Learning
Mohamad Kazem Shirani Faradonbeh
Ambuj Tewari
George Michailidis
75
12
0
14 Mar 2019
Perturbed-History Exploration in Stochastic Multi-Armed Bandits
Branislav Kveton
Csaba Szepesvári
Mohammad Ghavamzadeh
Craig Boutilier
79
31
0
26 Feb 2019
Linear Bandits with Stochastic Delayed Feedback
Claire Vernade
Alexandra Carpentier
Tor Lattimore
Giovanni Zappella
Beyza Ermis
M. Brueckner
122
67
0
05 Jul 2018
1