Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

23 February 2016

Pierre Ménard

Papers citing "Explore First, Exploit Next: The True Shape of Regret in Bandit Problems"

32 / 32 papers shown

Title
On Stopping Times of Power-one Sequential Tests: Tight Lower and Upper Bounds Shubhada Agrawal Aaditya Ramdas 29 0 0 28 Apr 2025
Multi-Armed Bandits with Abstention Junwen Yang Tianyuan Jin Vincent Y. F. Tan 31 0 0 23 Feb 2024
When is Agnostic Reinforcement Learning Statistically Tractable? Zeyu Jia Gene Li Alexander Rakhlin Ayush Sekhari Nathan Srebro OffRL 32 5 0 09 Oct 2023
CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption Shubhada Agrawal Timothée Mathieu D. Basu Odalric-Ambrym Maillard 30 2 0 28 Sep 2023
Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms Denis Belomestny Pierre Menard A. Naumov D. Tiapkin Michal Valko 22 2 0 06 Apr 2023
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments Runlong Zhou Zihan Zhang S. Du 44 10 0 31 Jan 2023
Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits Nikolai Karpov Qin Zhang 34 1 0 26 Jan 2023
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds Hao Liang Zhihui Luo 28 14 0 25 Oct 2022
Reward-Mixing MDPs with a Few Latent Contexts are Learnable Jeongyeol Kwon Yonathan Efroni C. Caramanis Shie Mannor 31 5 0 05 Oct 2022
Square-root regret bounds for continuous-time episodic Markov decision processes Xuefeng Gao X. Zhou 43 6 0 03 Oct 2022
Near-Optimal Collaborative Learning in Bandits Clémence Réda Sattar Vakili E. Kaufmann FedML 30 21 0 31 May 2022
Instance-Dependent Regret Analysis of Kernelized Bandits S. Shekhar T. Javidi 27 3 0 12 Mar 2022
On Slowly-varying Non-stationary Bandits Ramakrishnan Krishnamurthy Médéric Fourmy 27 8 0 25 Oct 2021
Reinforcement Learning in Reward-Mixing MDPs Jeongyeol Kwon Yonathan Efroni C. Caramanis Shie Mannor 32 15 0 07 Oct 2021
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits Joel Q. L. Chang Vincent Y. F. Tan 47 14 0 25 Aug 2021
The Role of Contextual Information in Best Arm Identification Masahiro Kato Kaito Ariu 43 18 0 26 Jun 2021
RL for Latent MDPs: Regret Guarantees and a Lower Bound Jeongyeol Kwon Yonathan Efroni C. Caramanis Shie Mannor 24 77 0 09 Feb 2021
Confidence-Budget Matching for Sequential Budgeted Learning Yonathan Efroni Nadav Merlis Aadirupa Saha Shie Mannor 23 10 0 05 Feb 2021
Optimal Thompson Sampling strategies for support-aware CVaR bandits Dorian Baudry Romain Gautron E. Kaufmann Odalric-Ambrym Maillard 14 33 0 10 Dec 2020
Gamification of Pure Exploration for Linear Bandits Rémy Degenne Pierre Ménard Xuedong Shang Michal Valko 13 76 0 02 Jul 2020
Real-time calibration of coherent-state receivers: learning by trial and error M. Bilkis M. Rosati R. M. Yepes J. Calsamiglia 47 14 0 28 Jan 2020
Exploration by Optimisation in Partial Monitoring Tor Lattimore Csaba Szepesvári 33 38 0 12 Jul 2019
From self-tuning regulators to reinforcement learning and back again Nikolai Matni Alexandre Proutiere Anders Rantzer Stephen Tu 27 88 0 27 Jun 2019
Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost? D. Basu Christos Dimitrakakis Aristide C. Y. Tossou 19 43 0 29 May 2019
Polynomial Cost of Adaptation for X -Armed Bandits Hédi Hadiji 28 11 0 24 May 2019
Sample Complexity Lower Bounds for Linear System Identification Yassir Jedra Alexandre Proutiere 11 40 0 25 Mar 2019
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling E. Kaufmann Wouter M. Koolen Aurélien Garivier 16 25 0 04 Jun 2018
Exploration in Structured Reinforcement Learning Jungseul Ok Alexandre Proutiere Damianos Tranos 27 62 0 03 Jun 2018
KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints Aurélien Garivier Hédi Hadiji Pierre Menard Gilles Stoltz 28 32 0 14 May 2018
Corrupt Bandits for Preserving Local Privacy Pratik Gajane Tanguy Urvoy E. Kaufmann 15 6 0 16 Aug 2017
Learning the distribution with largest mean: two bandit frameworks E. Kaufmann Aurélien Garivier 27 19 0 31 Jan 2017
Bounded regret in stochastic multi-armed bandits Sébastien Bubeck Vianney Perchet Philippe Rigollet 71 91 0 06 Feb 2013