v1v2v3 (latest)

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

13 November 2018

Papers citing "Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits"

50 / 50 papers shown

Title
BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms Yunlong Hou Fengzhuo Zhang Cunxiao Du Xuan Zhang Jiachun Pan Tianyu Pang Chao Du Vincent Y. F. Tan Zhuoran Yang OffRL 122 1 0 21 May 2025
QuACK: A Multipurpose Queuing Algorithm for Cooperative $k$ -Armed Bandits Benjamin Howson Sarah Filippi Ciara Pike-Burke 72 1 0 31 Oct 2024
Random Latent Exploration for Deep Reinforcement Learning Srinath Mahankali Zhang-Wei Hong Ayush Sekhari Alexander Rakhlin Pulkit Agrawal 264 3 0 18 Jul 2024
Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits Ziyi Huang Henry Lam Haofeng Zhang 89 0 0 20 Jun 2024
Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility Yuantong Li Guang Cheng Xiaowu Dai 72 1 0 04 Jun 2024
Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models Masatoshi Uehara Yulai Zhao Ehsan Hajiramezanali Gabriele Scalia Gökçen Eraslan Avantika Lal Sergey Levine Tommaso Biancalani 133 16 0 30 May 2024
Feedback Efficient Online Fine-Tuning of Diffusion Models Masatoshi Uehara Yulai Zhao Kevin Black Ehsan Hajiramezanali Gabriele Scalia N. Diamant Alex Tseng Sergey Levine Tommaso Biancalani 121 28 0 26 Feb 2024
Randomized Confidence Bounds for Stochastic Partial Monitoring M. Heuillet Ola Ahmad Audrey Durand 96 1 0 07 Feb 2024
Zero-Inflated Bandits Haoyu Wei Runzhe Wan Lei Shi Rui Song 111 0 0 25 Dec 2023
Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem Elena Gal Shaun Singh Aldo Pacchiano Benjamin Walker Terry Lyons Jakob N. Foerster FaML 67 0 0 15 Aug 2023
Differential Good Arm Identification Yun-Da Tsai Tzu-Hsien Tsai Shou-De Lin 22 6 0 13 Mar 2023
A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms Dorian Baudry Kazuya Suzuki Junya Honda 60 5 0 10 Mar 2023
Multiplier Bootstrap-based Exploration Runzhe Wan Haoyu Wei Branislav Kveton R. Song 52 3 0 03 Feb 2023
Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees D. Tiapkin Denis Belomestny Daniele Calandriello Eric Moulines Rémi Munos A. Naumov Mark Rowland Michal Valko Pierre Menard 105 10 0 28 Sep 2022
A Nonparametric Contextual Bandit with Arm-level Eligibility Control for Customer Service Routing Ruofeng Wen Wenjun Zeng Yi Liu 59 0 0 08 Sep 2022
An Analysis of Ensemble Sampling Chao Qin Zheng Wen Xiuyuan Lu Benjamin Van Roy 126 22 0 02 Mar 2022
Residual Bootstrap Exploration for Stochastic Linear Bandit Shuang Wu ChiHua Wang Yuantong Li Guang Cheng 81 8 0 23 Feb 2022
Learning Neural Contextual Bandits Through Perturbed Rewards Yiling Jia Weitong Zhang Dongruo Zhou Quanquan Gu Hongning Wang 139 14 0 24 Jan 2022
Neural Pseudo-Label Optimism for the Bank Loan Problem Aldo Pacchiano Shaun Singh Edward Chou Alexander C. Berg Jakob N. Foerster 49 7 0 03 Dec 2021
From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits Dorian Baudry Patrick Saux Odalric-Ambrym Maillard 65 7 0 18 Nov 2021
Maillard Sampling: Boltzmann Exploration Done Optimally Jieming Bian Kwang-Sung Jun 62 13 0 05 Nov 2021
Debiasing Samples from Online Learning Using Bootstrap Ningyuan Chen Xuefeng Gao Yi Xiong OffRL OnRL 64 4 0 31 Jul 2021
GuideBoot: Guided Bootstrap for Deep Contextual Bandits Feiyang Pan Haoming Li Xiang Ao Wei Wang Yanrong Kang Ao Tan Qing He 40 0 0 18 Jul 2021
Random Effect Bandits Rong Zhu Branislav Kveton 44 4 0 23 Jun 2021
On Limited-Memory Subsampling Strategies for Bandits Dorian Baudry Yoan Russac Olivier Cappé 117 8 0 21 Jun 2021
Multi-armed Bandit Requiring Monotone Arm Sequences Ningyuan Chen 144 11 0 07 Jun 2021
Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks Rong Zhu Mattia Rigotti 65 7 0 10 May 2021
CORe: Capitalizing On Rewards in Bandit Exploration Nan Wang Branislav Kveton Maryam Karimzadehgan 20 2 0 07 Mar 2021
Policy Optimization as Online Learning with Mediator Feedback Alberto Maria Metelli Matteo Papini P. DÓro Marcello Restelli OffRL 58 10 0 15 Dec 2020
DORB: Dynamically Optimizing Multiple Rewards with Bandits Ramakanth Pasunuru Han Guo Joey Tianyi Zhou OffRL 72 7 0 15 Nov 2020
Sub-sampling for Efficient Non-Parametric Bandit Exploration Dorian Baudry E. Kaufmann Odalric-Ambrym Maillard 55 14 0 27 Oct 2020
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration Priyank Agrawal Jinglin Chen Nan Jiang 117 21 0 23 Oct 2020
Learning to Rank under Multinomial Logit Choice James A. Grant David S. Leslie 39 0 0 07 Sep 2020
Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation Ilya Kostrikov Ofir Nachum OffRL 70 31 0 27 Jul 2020
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems Tong Yu Branislav Kveton Zheng Wen Ruiyi Zhang Ole J. Mengshoel TDI 31 2 0 09 Jul 2020
Towards Tractable Optimism in Model-Based Reinforcement Learning Aldo Pacchiano Philip J. Ball Jack Parker-Holder K. Choromanski Stephen J. Roberts OffRL 61 12 0 21 Jun 2020
BanditPAM: Almost Linear Time $k$ -Medoids Clustering via Multi-Armed Bandits Mo Tiwari Martin Jinye Zhang James Mayclin Sebastian Thrun Chris Piech Ilan Shomorony 56 11 0 11 Jun 2020
Meta-Learning Bandit Policies by Gradient Ascent Branislav Kveton Martin Mladenov Chih-Wei Hsu Manzil Zaheer Csaba Szepesvári Craig Boutilier 76 9 0 09 Jun 2020
Differentiable Linear Bandit Algorithm Kaige Yang Laura Toni 65 6 0 04 Jun 2020
Self-Supervised Contextual Bandits in Computer Vision A. Deshmukh Abhimanu Kumar Levi Boyles Denis Xavier Charles Eren Manavoglu Ürün Dogan SSL 55 3 0 18 Mar 2020
Residual Bootstrap Exploration for Bandit Algorithms ChiHua Wang Yang Yu Botao Hao Guang Cheng 64 16 0 19 Feb 2020
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles Dylan J. Foster Alexander Rakhlin 373 213 0 12 Feb 2020
Old Dog Learns New Tricks: Randomized UCB for Bandit Problems Sharan Vaswani Abbas Mehrabian A. Durand Branislav Kveton 80 28 0 11 Oct 2019
Thompson Sampling with Approximate Inference My Phan Yasin Abbasi-Yadkori Justin Domke 64 29 0 14 Aug 2019
Randomized Exploration in Generalized Linear Bandits Branislav Kveton Manzil Zaheer Csaba Szepesvári Lihong Li Mohammad Ghavamzadeh Craig Boutilier 95 98 0 21 Jun 2019
Bootstrapping Upper Confidence Bound Botao Hao Yasin Abbasi-Yadkori Zheng Wen Guang Cheng 116 55 0 12 Jun 2019
Perturbed-History Exploration in Stochastic Linear Bandits Branislav Kveton Csaba Szepesvári Mohammad Ghavamzadeh Craig Boutilier 67 43 0 21 Mar 2019
On Applications of Bootstrap in Continuous Space Reinforcement Learning Mohamad Kazem Shirani Faradonbeh Ambuj Tewari George Michailidis 75 12 0 14 Mar 2019
Perturbed-History Exploration in Stochastic Multi-Armed Bandits Branislav Kveton Csaba Szepesvári Mohammad Ghavamzadeh Craig Boutilier 79 31 0 26 Feb 2019
Linear Bandits with Stochastic Delayed Feedback Claire Vernade Alexandra Carpentier Tor Lattimore Giovanni Zappella Beyza Ermis M. Brueckner 113 67 0 05 Jul 2018