Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

27 January 2019

Papers citing "Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP"

33 / 33 papers shown

Title
Automatic Reward Shaping from Confounded Offline Data Mingxuan Li Junzhe Zhang Elias Bareinboim OffRL OnRL 33 0 0 16 May 2025
The Bandit Whisperer: Communication Learning for Restless Bandits Yunfan Zhao Tonghan Wang Dheeraj M. Nagaraj Aparna Taneja Milind Tambe 54 5 0 11 Aug 2024
Learning to Steer Markovian Agents under Model Uncertainty Jiawei Huang Vinzenz Thoma Zebang Shen H. Nax Niao He 48 2 0 14 Jul 2024
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices Jiin Woo Laixi Shi Gauri Joshi Yuejie Chi OffRL 34 3 0 08 Feb 2024
Settling the Sample Complexity of Online Reinforcement Learning Zihan Zhang Yuxin Chen Jason D. Lee S. Du OffRL 98 22 0 25 Jul 2023
Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time Xiang Ji Gen Li OffRL 32 7 0 24 May 2023
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback Yang Cai Haipeng Luo Chen-Yu Wei Weiqiang Zheng 31 18 0 05 Mar 2023
Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor Julien Grand-Clément Marko Petrik 35 14 0 31 Jan 2023
Provable Reset-free Reinforcement Learning by No-Regret Reduction Hoai-An Nguyen Ching-An Cheng OffRL 31 2 0 06 Jan 2023
Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning Yinglun Xu Qi Zeng Gagandeep Singh AAML 40 6 0 30 May 2022
Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies Zihan Zhang Xiangyang Ji S. Du 30 21 0 24 Mar 2022
The Efficacy of Pessimism in Asynchronous Q-Learning Yuling Yan Gen Li Yuxin Chen Jianqing Fan OffRL 78 40 0 14 Mar 2022
Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints Liyu Chen R. Jain Haipeng Luo 57 25 0 31 Jan 2022
Recent Advances in Reinforcement Learning in Finance B. Hambly Renyuan Xu Huining Yang OffRL 29 167 0 08 Dec 2021
Interesting Object, Curious Agent: Learning Task-Agnostic Exploration Simone Parisi Victoria Dean Deepak Pathak Abhinav Gupta LM&Ro 40 50 0 25 Nov 2021
Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning Yuanzhi Li Ruosong Wang Lin F. Yang 27 20 0 01 Nov 2021
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning Gen Li Laixi Shi Yuxin Chen Yuejie Chi OffRL 47 51 0 09 Oct 2021
A Survey of Exploration Methods in Reinforcement Learning Susan Amin Maziar Gomrokchi Harsh Satija H. V. Hoof Doina Precup OffRL 34 80 0 01 Sep 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses Haipeng Luo Chen-Yu Wei Chung-Wei Lee 38 44 0 18 Jul 2021
A Provably-Efficient Model-Free Algorithm for Constrained Markov Decision Processes Honghao Wei Xin Liu Lei Ying 21 21 0 03 Jun 2021
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret Jean Tarbouriech Runlong Zhou S. Du Matteo Pirotta M. Valko A. Lazaric 59 35 0 22 Apr 2021
Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis Gen Li Changxiao Cai Ee Yuting Wei Yuejie Chi OffRL 50 75 0 12 Feb 2021
Control with adaptive Q-learning J. Araújo Mário A. T. Figueiredo M. Botto 33 2 0 03 Nov 2020
Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs Jiafan He Dongruo Zhou Quanquan Gu 21 37 0 01 Oct 2020
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon Zihan Zhang Xiangyang Ji S. Du OffRL 17 104 0 28 Sep 2020
Single-partition adaptive Q-learning J. Araújo Mário A. T. Figueiredo M. Botto OffRL 20 2 0 14 Jul 2020
A Provably Efficient Sample Collection Strategy for Reinforcement Learning Jean Tarbouriech Matteo Pirotta Michal Valko A. Lazaric OffRL 25 16 0 13 Jul 2020
Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping Dongruo Zhou Jiafan He Quanquan Gu 30 133 0 23 Jun 2020
$Q$ -learning with Logarithmic Regret Kunhe Yang Lin F. Yang S. Du 43 59 0 16 Jun 2020
Multi-Agent Reinforcement Learning in Stochastic Networked Systems Yiheng Lin Guannan Qu Longbo Huang Adam Wierman 34 38 0 11 Jun 2020
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium Qiaomin Xie Yudong Chen Zhaoran Wang Zhuoran Yang 39 124 0 17 Feb 2020
Adaptive Approximate Policy Iteration Botao Hao N. Lazić Yasin Abbasi-Yadkori Pooria Joulani Csaba Szepesvári 18 14 0 08 Feb 2020
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes Chen-Yu Wei Mehdi Jafarnia-Jahromi Haipeng Luo Hiteshi Sharma R. Jain 107 100 0 15 Oct 2019