ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07182
  4. Cited By
Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

23 February 2016
Aurélien Garivier
Pierre Ménard
Gilles Stoltz
ArXivPDFHTML

Papers citing "Explore First, Exploit Next: The True Shape of Regret in Bandit Problems"

32 / 32 papers shown
Title
On Stopping Times of Power-one Sequential Tests: Tight Lower and Upper Bounds
On Stopping Times of Power-one Sequential Tests: Tight Lower and Upper Bounds
Shubhada Agrawal
Aaditya Ramdas
29
0
0
28 Apr 2025
Multi-Armed Bandits with Abstention
Multi-Armed Bandits with Abstention
Junwen Yang
Tianyuan Jin
Vincent Y. F. Tan
31
0
0
23 Feb 2024
When is Agnostic Reinforcement Learning Statistically Tractable?
When is Agnostic Reinforcement Learning Statistically Tractable?
Zeyu Jia
Gene Li
Alexander Rakhlin
Ayush Sekhari
Nathan Srebro
OffRL
32
5
0
09 Oct 2023
CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded
  Stochastic Corruption
CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption
Shubhada Agrawal
Timothée Mathieu
D. Basu
Odalric-Ambrym Maillard
30
2
0
28 Sep 2023
Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to
  analysis of Bayesian algorithms
Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms
Denis Belomestny
Pierre Menard
A. Naumov
D. Tiapkin
Michal Valko
22
2
0
06 Apr 2023
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both
  Worlds in Stochastic and Deterministic Environments
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
Runlong Zhou
Zihan Zhang
S. Du
44
10
0
31 Jan 2023
Communication-Efficient Collaborative Regret Minimization in Multi-Armed
  Bandits
Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits
Nikolai Karpov
Qin Zhang
34
1
0
26 Jan 2023
Bridging Distributional and Risk-sensitive Reinforcement Learning with
  Provable Regret Bounds
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
Hao Liang
Zhihui Luo
28
14
0
25 Oct 2022
Reward-Mixing MDPs with a Few Latent Contexts are Learnable
Reward-Mixing MDPs with a Few Latent Contexts are Learnable
Jeongyeol Kwon
Yonathan Efroni
C. Caramanis
Shie Mannor
31
5
0
05 Oct 2022
Square-root regret bounds for continuous-time episodic Markov decision
  processes
Square-root regret bounds for continuous-time episodic Markov decision processes
Xuefeng Gao
X. Zhou
43
6
0
03 Oct 2022
Near-Optimal Collaborative Learning in Bandits
Near-Optimal Collaborative Learning in Bandits
Clémence Réda
Sattar Vakili
E. Kaufmann
FedML
30
21
0
31 May 2022
Instance-Dependent Regret Analysis of Kernelized Bandits
Instance-Dependent Regret Analysis of Kernelized Bandits
S. Shekhar
T. Javidi
27
3
0
12 Mar 2022
On Slowly-varying Non-stationary Bandits
On Slowly-varying Non-stationary Bandits
Ramakrishnan Krishnamurthy
Médéric Fourmy
24
8
0
25 Oct 2021
Reinforcement Learning in Reward-Mixing MDPs
Reinforcement Learning in Reward-Mixing MDPs
Jeongyeol Kwon
Yonathan Efroni
C. Caramanis
Shie Mannor
32
15
0
07 Oct 2021
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
  Bandits
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits
Joel Q. L. Chang
Vincent Y. F. Tan
47
14
0
25 Aug 2021
The Role of Contextual Information in Best Arm Identification
The Role of Contextual Information in Best Arm Identification
Masahiro Kato
Kaito Ariu
40
18
0
26 Jun 2021
RL for Latent MDPs: Regret Guarantees and a Lower Bound
RL for Latent MDPs: Regret Guarantees and a Lower Bound
Jeongyeol Kwon
Yonathan Efroni
C. Caramanis
Shie Mannor
24
77
0
09 Feb 2021
Confidence-Budget Matching for Sequential Budgeted Learning
Confidence-Budget Matching for Sequential Budgeted Learning
Yonathan Efroni
Nadav Merlis
Aadirupa Saha
Shie Mannor
19
10
0
05 Feb 2021
Optimal Thompson Sampling strategies for support-aware CVaR bandits
Optimal Thompson Sampling strategies for support-aware CVaR bandits
Dorian Baudry
Romain Gautron
E. Kaufmann
Odalric-Ambrym Maillard
11
33
0
10 Dec 2020
Gamification of Pure Exploration for Linear Bandits
Gamification of Pure Exploration for Linear Bandits
Rémy Degenne
Pierre Ménard
Xuedong Shang
Michal Valko
13
76
0
02 Jul 2020
Real-time calibration of coherent-state receivers: learning by trial and
  error
Real-time calibration of coherent-state receivers: learning by trial and error
M. Bilkis
M. Rosati
R. M. Yepes
J. Calsamiglia
47
14
0
28 Jan 2020
Exploration by Optimisation in Partial Monitoring
Exploration by Optimisation in Partial Monitoring
Tor Lattimore
Csaba Szepesvári
33
38
0
12 Jul 2019
From self-tuning regulators to reinforcement learning and back again
From self-tuning regulators to reinforcement learning and back again
Nikolai Matni
Alexandre Proutiere
Anders Rantzer
Stephen Tu
27
88
0
27 Jun 2019
Differential Privacy for Multi-armed Bandits: What Is It and What Is Its
  Cost?
Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost?
D. Basu
Christos Dimitrakakis
Aristide C. Y. Tossou
16
43
0
29 May 2019
Polynomial Cost of Adaptation for X -Armed Bandits
Polynomial Cost of Adaptation for X -Armed Bandits
Hédi Hadiji
28
11
0
24 May 2019
Sample Complexity Lower Bounds for Linear System Identification
Sample Complexity Lower Bounds for Linear System Identification
Yassir Jedra
Alexandre Proutiere
11
40
0
25 Mar 2019
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling
E. Kaufmann
Wouter M. Koolen
Aurélien Garivier
16
25
0
04 Jun 2018
Exploration in Structured Reinforcement Learning
Exploration in Structured Reinforcement Learning
Jungseul Ok
Alexandre Proutiere
Damianos Tranos
25
62
0
03 Jun 2018
KL-UCB-switch: optimal regret bounds for stochastic bandits from both a
  distribution-dependent and a distribution-free viewpoints
KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints
Aurélien Garivier
Hédi Hadiji
Pierre Menard
Gilles Stoltz
28
32
0
14 May 2018
Corrupt Bandits for Preserving Local Privacy
Corrupt Bandits for Preserving Local Privacy
Pratik Gajane
Tanguy Urvoy
E. Kaufmann
13
6
0
16 Aug 2017
Learning the distribution with largest mean: two bandit frameworks
Learning the distribution with largest mean: two bandit frameworks
E. Kaufmann
Aurélien Garivier
24
19
0
31 Jan 2017
Bounded regret in stochastic multi-armed bandits
Bounded regret in stochastic multi-armed bandits
Sébastien Bubeck
Vianney Perchet
Philippe Rigollet
71
91
0
06 Feb 2013
1