Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.07182
Cited By
Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
23 February 2016
Aurélien Garivier
Pierre Ménard
Gilles Stoltz
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Explore First, Exploit Next: The True Shape of Regret in Bandit Problems"
32 / 32 papers shown
Title
On Stopping Times of Power-one Sequential Tests: Tight Lower and Upper Bounds
Shubhada Agrawal
Aaditya Ramdas
29
0
0
28 Apr 2025
Multi-Armed Bandits with Abstention
Junwen Yang
Tianyuan Jin
Vincent Y. F. Tan
31
0
0
23 Feb 2024
When is Agnostic Reinforcement Learning Statistically Tractable?
Zeyu Jia
Gene Li
Alexander Rakhlin
Ayush Sekhari
Nathan Srebro
OffRL
32
5
0
09 Oct 2023
CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption
Shubhada Agrawal
Timothée Mathieu
D. Basu
Odalric-Ambrym Maillard
30
2
0
28 Sep 2023
Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms
Denis Belomestny
Pierre Menard
A. Naumov
D. Tiapkin
Michal Valko
22
2
0
06 Apr 2023
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
Runlong Zhou
Zihan Zhang
S. Du
44
10
0
31 Jan 2023
Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits
Nikolai Karpov
Qin Zhang
34
1
0
26 Jan 2023
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
Hao Liang
Zhihui Luo
28
14
0
25 Oct 2022
Reward-Mixing MDPs with a Few Latent Contexts are Learnable
Jeongyeol Kwon
Yonathan Efroni
C. Caramanis
Shie Mannor
31
5
0
05 Oct 2022
Square-root regret bounds for continuous-time episodic Markov decision processes
Xuefeng Gao
X. Zhou
43
6
0
03 Oct 2022
Near-Optimal Collaborative Learning in Bandits
Clémence Réda
Sattar Vakili
E. Kaufmann
FedML
30
21
0
31 May 2022
Instance-Dependent Regret Analysis of Kernelized Bandits
S. Shekhar
T. Javidi
27
3
0
12 Mar 2022
On Slowly-varying Non-stationary Bandits
Ramakrishnan Krishnamurthy
Médéric Fourmy
27
8
0
25 Oct 2021
Reinforcement Learning in Reward-Mixing MDPs
Jeongyeol Kwon
Yonathan Efroni
C. Caramanis
Shie Mannor
32
15
0
07 Oct 2021
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits
Joel Q. L. Chang
Vincent Y. F. Tan
47
14
0
25 Aug 2021
The Role of Contextual Information in Best Arm Identification
Masahiro Kato
Kaito Ariu
43
18
0
26 Jun 2021
RL for Latent MDPs: Regret Guarantees and a Lower Bound
Jeongyeol Kwon
Yonathan Efroni
C. Caramanis
Shie Mannor
24
77
0
09 Feb 2021
Confidence-Budget Matching for Sequential Budgeted Learning
Yonathan Efroni
Nadav Merlis
Aadirupa Saha
Shie Mannor
23
10
0
05 Feb 2021
Optimal Thompson Sampling strategies for support-aware CVaR bandits
Dorian Baudry
Romain Gautron
E. Kaufmann
Odalric-Ambrym Maillard
14
33
0
10 Dec 2020
Gamification of Pure Exploration for Linear Bandits
Rémy Degenne
Pierre Ménard
Xuedong Shang
Michal Valko
13
76
0
02 Jul 2020
Real-time calibration of coherent-state receivers: learning by trial and error
M. Bilkis
M. Rosati
R. M. Yepes
J. Calsamiglia
47
14
0
28 Jan 2020
Exploration by Optimisation in Partial Monitoring
Tor Lattimore
Csaba Szepesvári
33
38
0
12 Jul 2019
From self-tuning regulators to reinforcement learning and back again
Nikolai Matni
Alexandre Proutiere
Anders Rantzer
Stephen Tu
27
88
0
27 Jun 2019
Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost?
D. Basu
Christos Dimitrakakis
Aristide C. Y. Tossou
19
43
0
29 May 2019
Polynomial Cost of Adaptation for X -Armed Bandits
Hédi Hadiji
28
11
0
24 May 2019
Sample Complexity Lower Bounds for Linear System Identification
Yassir Jedra
Alexandre Proutiere
11
40
0
25 Mar 2019
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling
E. Kaufmann
Wouter M. Koolen
Aurélien Garivier
16
25
0
04 Jun 2018
Exploration in Structured Reinforcement Learning
Jungseul Ok
Alexandre Proutiere
Damianos Tranos
27
62
0
03 Jun 2018
KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints
Aurélien Garivier
Hédi Hadiji
Pierre Menard
Gilles Stoltz
28
32
0
14 May 2018
Corrupt Bandits for Preserving Local Privacy
Pratik Gajane
Tanguy Urvoy
E. Kaufmann
15
6
0
16 Aug 2017
Learning the distribution with largest mean: two bandit frameworks
E. Kaufmann
Aurélien Garivier
27
19
0
31 Jan 2017
Bounded regret in stochastic multi-armed bandits
Sébastien Bubeck
Vianney Perchet
Philippe Rigollet
71
91
0
06 Feb 2013
1