ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.05154
  4. Cited By
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

13 November 2018
Branislav Kveton
Csaba Szepesvári
Sharan Vaswani
Zheng Wen
Mohammad Ghavamzadeh
Tor Lattimore
ArXivPDFHTML

Papers citing "Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits"

13 / 13 papers shown
Title
Random Latent Exploration for Deep Reinforcement Learning
Random Latent Exploration for Deep Reinforcement Learning
Srinath Mahankali
Zhang-Wei Hong
Ayush Sekhari
Alexander Rakhlin
Pulkit Agrawal
38
3
0
18 Jul 2024
Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits
Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits
Ziyi Huang
Henry Lam
Haofeng Zhang
35
0
0
20 Jun 2024
Zero-Inflated Bandits
Zero-Inflated Bandits
Haoyu Wei
Runzhe Wan
Lei Shi
Rui Song
44
0
0
25 Dec 2023
Multiplier Bootstrap-based Exploration
Multiplier Bootstrap-based Exploration
Runzhe Wan
Haoyu Wei
Branislav Kveton
R. Song
21
3
0
03 Feb 2023
A Nonparametric Contextual Bandit with Arm-level Eligibility Control for
  Customer Service Routing
A Nonparametric Contextual Bandit with Arm-level Eligibility Control for Customer Service Routing
Ruofeng Wen
Wenjun Zeng
Yi Liu
26
0
0
08 Sep 2022
An Analysis of Ensemble Sampling
An Analysis of Ensemble Sampling
Chao Qin
Zheng Wen
Xiuyuan Lu
Benjamin Van Roy
34
21
0
02 Mar 2022
Maillard Sampling: Boltzmann Exploration Done Optimally
Maillard Sampling: Boltzmann Exploration Done Optimally
Jieming Bian
Kwang-Sung Jun
24
12
0
05 Nov 2021
Policy Optimization as Online Learning with Mediator Feedback
Policy Optimization as Online Learning with Mediator Feedback
Alberto Maria Metelli
Matteo Papini
P. DÓro
Marcello Restelli
OffRL
27
10
0
15 Dec 2020
DORB: Dynamically Optimizing Multiple Rewards with Bandits
DORB: Dynamically Optimizing Multiple Rewards with Bandits
Ramakanth Pasunuru
Han Guo
Joey Tianyi Zhou
OffRL
32
6
0
15 Nov 2020
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value
  Iteration
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration
Priyank Agrawal
Jinglin Chen
Nan Jiang
30
18
0
23 Oct 2020
BanditPAM: Almost Linear Time $k$-Medoids Clustering via Multi-Armed
  Bandits
BanditPAM: Almost Linear Time kkk-Medoids Clustering via Multi-Armed Bandits
Mo Tiwari
Martin Jinye Zhang
James Mayclin
Sebastian Thrun
Chris Piech
Ilan Shomorony
32
11
0
11 Jun 2020
Self-Supervised Contextual Bandits in Computer Vision
Self-Supervised Contextual Bandits in Computer Vision
A. Deshmukh
Abhimanu Kumar
Levi Boyles
Denis Xavier Charles
Eren Manavoglu
Ürün Dogan
SSL
31
3
0
18 Mar 2020
Perturbed-History Exploration in Stochastic Linear Bandits
Perturbed-History Exploration in Stochastic Linear Bandits
Branislav Kveton
Csaba Szepesvári
Mohammad Ghavamzadeh
Craig Boutilier
24
41
0
21 Mar 2019
1