Explore no more: Improved high-probability regret bounds for non-stochastic bandits

10 June 2015

Papers citing "Explore no more: Improved high-probability regret bounds for non-stochastic bandits"

45 / 45 papers shown

Title
Online Episodic Convex Reinforcement Learning B. Moreno Khaled Eldowa Pierre Gaillard Margaux Brégère Nadia Oudjane OffRL 31 0 0 12 May 2025
A New Benchmark for Online Learning with Budget-Balancing Constraints M. Braverman Jingyi Liu Jieming Mao Jon Schneider Eric Xue 60 0 0 19 Mar 2025
Online Planning of Power Flows for Power Systems Against Bushfires Using Spatial Context Jianyu Xu Qiuzhuang Sun Yang Yang Huadong Mo Daoyi Dong 83 0 0 24 Feb 2025
Beyond Minimax Rates in Group Distributionally Robust Optimization via a Novel Notion of Sparsity Quan Nguyen Nishant A. Mehta Cristóbal Guzmán 39 1 0 01 Oct 2024
Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints Martino Bernasconi Matteo Castiglioni A. Celli Federico Fusco 31 2 0 25 May 2024
No-Regret Algorithms in non-Truthful Auctions with Budget and ROI Constraints Gagan Aggarwal Giannis Fikioris Mingfei Zhao 40 5 0 15 Apr 2024
Stochastic Online Optimization for Cyber-Physical and Robotic Systems Hao Ma Melanie Zeilinger Michael Muehlebach 62 0 0 08 Apr 2024
Distributed No-Regret Learning for Multi-Stage Systems with End-to-End Bandit Feedback I-Hong Hou OffRL 44 0 0 06 Apr 2024
Learning Adversarial MDPs with Stochastic Hard Constraints Francesco Emanuele Stradi Matteo Castiglioni A. Marchesi Nicola Gatti 39 4 0 06 Mar 2024
CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption Shubhada Agrawal Timothée Mathieu D. Basu Odalric-Ambrym Maillard 30 2 0 28 Sep 2023
A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays Saeed Masoudian Julian Zimmert Yevgeny Seldin 47 3 0 21 Aug 2023
Anytime Model Selection in Linear Bandits Parnian Kassraie N. Emmenegger Andreas Krause Aldo Pacchiano 54 2 0 24 Jul 2023
Meta-Learning Adversarial Bandit Algorithms M. Khodak Ilya Osadchiy Keegan Harris Maria-Florina Balcan Kfir Y. Levy Ron Meir Zhiwei Steven Wu FedML 30 2 0 05 Jul 2023
Bandits with Replenishable Knapsacks: the Best of both Worlds Martino Bernasconi Matteo Castiglioni A. Celli Federico Fusco 41 4 0 14 Jun 2023
Bandits for Sponsored Search Auctions under Unknown Valuation Model: Case Study in E-Commerce Advertising Danil Provodin Jérémie Joudioux E. Duryev 24 0 0 31 Mar 2023
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback Yang Cai Haipeng Luo Chen-Yu Wei Weiqiang Zheng 34 18 0 05 Mar 2023
Estimating Optimal Policy Value in General Linear Contextual Bandits Jonathan Lee Weihao Kong Aldo Pacchiano Vidya Muthukumar Emma Brunskill 30 0 0 19 Feb 2023
Contextual Bandits and Optimistically Universal Learning Moise Blanchard Steve Hanneke Patrick Jaillet OffRL 28 1 0 31 Dec 2022
SLOPT: Bandit Optimization Framework for Mutation-Based Fuzzing Yuki Koike H. Katsura Hiromu Yakura Yuma Kurogome 31 5 0 07 Nov 2022
On-Demand Sampling: Learning Optimally from Multiple Distributions Nika Haghtalab Michael I. Jordan Eric Zhao FedML 55 35 0 22 Oct 2022
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs Haipeng Luo Hanghang Tong Mengxiao Zhang Yuheng Zhang 16 5 0 04 Oct 2022
Actor-Critic based Improper Reinforcement Learning Mohammadi Zaki Avinash Mohan Aditya Gopalan Shie Mannor 21 2 0 19 Jul 2022
Best of Both Worlds Model Selection Aldo Pacchiano Christoph Dann Claudio Gentile 36 10 0 29 Jun 2022
The Complexity of Markov Equilibrium in Stochastic Games C. Daskalakis Noah Golowich Kaipeng Zhang 41 56 0 08 Apr 2022
Generalized Bandit Regret Minimizer Framework in Imperfect Information Extensive-Form Game Lin Meng Yang Gao 52 1 0 11 Mar 2022
Near-Optimal Learning of Extensive-Form Games with Imperfect Information Yunru Bai Chi Jin Song Mei Tiancheng Yu 26 26 0 03 Feb 2022
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback Tiancheng Jin Tal Lancewicki Haipeng Luo Yishay Mansour Aviv A. Rosenberg 74 21 0 31 Jan 2022
Uncoupled Bandit Learning towards Rationalizability: Benchmarks, Barriers, and Algorithms Jibang Wu Haifeng Xu Fan Yao 35 1 0 10 Nov 2021
Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure Hsu Kao Chen-Yu Wei V. Subramanian 33 12 0 01 Nov 2021
On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning Weichao Mao Lin F. Yang Kaipeng Zhang Tamer Bacsar 46 57 0 12 Oct 2021
Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games Weichao Mao Tamer Basar 36 66 0 12 Oct 2021
When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently? Ziang Song Song Mei Yu Bai 74 67 0 08 Oct 2021
Bandit Algorithms for Precision Medicine Yangyi Lu Ziping Xu Ambuj Tewari 66 11 0 10 Aug 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses Haipeng Luo Chen-Yu Wei Chung-Wei Lee 38 44 0 18 Jul 2021
Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall Tadashi Kozuno Pierre Ménard Rémi Munos Michal Valko 30 18 0 11 Jun 2021
Leveraging Good Representations in Linear Contextual Bandits Matteo Papini Andrea Tirinzoni Marcello Restelli A. Lazaric Matteo Pirotta 35 26 0 08 Apr 2021
A Simple Approach for Non-stationary Linear Bandits Peng Zhao Lijun Zhang Yuan Jiang Zhi-Hua Zhou 36 81 0 09 Mar 2021
Near-Optimal Reinforcement Learning with Self-Play Yunru Bai Chi Jin Tiancheng Yu 24 130 0 22 Jun 2020
Model selection for contextual bandits Dylan J. Foster A. Krishnamurthy Haipeng Luo OffRL 34 90 0 03 Jun 2019
Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model Gi-Soo Kim M. Paik 22 14 0 31 Jan 2019
Taming Non-stationary Bandits: A Bayesian Approach Vishnu Raj Sheetal Kalyani 32 76 0 31 Jul 2017
Online Learning with Abstention Corinna Cortes Giulia DeSalvo Claudio Gentile M. Mohri Scott Yang 9 47 0 09 Mar 2017
Refined Lower Bounds for Adversarial Bandits Sébastien Gerchinovitz Tor Lattimore AAML 25 58 0 24 May 2016
Delay and Cooperation in Nonstochastic Bandits Nicolò Cesa-Bianchi Claudio Gentile Yishay Mansour Alberto Minora 14 144 0 15 Feb 2016
Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback N. Alon Nicolò Cesa-Bianchi Claudio Gentile Shie Mannor Yishay Mansour Ohad Shamir OffRL 36 130 0 30 Sep 2014