Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.08346
Cited By
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
18 July 2021
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses"
33 / 33 papers shown
Title
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
184
0
0
12 May 2025
Decision Making in Hybrid Environments: A Model Aggregation Approach
Haolin Liu
Chen-Yu Wei
Julian Zimmert
230
0
0
09 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
156
45
0
31 Dec 2024
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
Masahiro Kato
Shinji Ito
139
0
0
05 Mar 2024
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
Chen Ye
Wei Xiong
Quanquan Gu
Tong Zhang
166
31
0
12 Dec 2022
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
Andrea Zanette
Ching-An Cheng
Alekh Agarwal
91
53
0
24 Mar 2021
Improved Regret Bound and Experience Replay in Regularized Policy Iteration
N. Lazić
Dong Yin
Yasin Abbasi-Yadkori
Csaba Szepesvári
OffRL
44
18
0
25 Feb 2021
Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case
Liyu Chen
Haipeng Luo
72
31
0
10 Feb 2021
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
Liyu Chen
Haipeng Luo
Chen-Yu Wei
73
32
0
07 Dec 2020
Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Rahul Jain
70
43
0
23 Jul 2020
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
Alekh Agarwal
Mikael Henaff
Sham Kakade
Wen Sun
OffRL
70
110
0
16 Jul 2020
Online learning in MDPs with linear function approximation and bandit feedback
Gergely Neu
Julia Olkhovskaya
49
32
0
03 Jul 2020
On Reward-Free Reinforcement Learning with Linear Function Approximation
Ruosong Wang
S. Du
Lin F. Yang
Ruslan Salakhutdinov
OffRL
73
107
0
19 Jun 2020
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
Chung-Wei Lee
Haipeng Luo
Chen-Yu Wei
Mengxiao Zhang
175
53
0
14 Jun 2020
Optimistic Policy Optimization with Bandit Feedback
Yonathan Efroni
Lior Shani
Aviv A. Rosenberg
Shie Mannor
56
90
0
19 Feb 2020
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
66
283
0
12 Dec 2019
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Chi Jin
Tiancheng Jin
Haipeng Luo
S. Sra
Tiancheng Yu
75
104
0
03 Dec 2019
Frequentist Regret Bounds for Randomized Least-Squares Value Iteration
Andrea Zanette
David Brandfonbrener
Emma Brunskill
Matteo Pirotta
A. Lazaric
78
132
0
01 Nov 2019
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Hiteshi Sharma
R. Jain
136
108
0
15 Oct 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
69
321
0
01 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function Approximation
Chi Jin
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
98
560
0
11 Jul 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
Lin F. Yang
Mengdi Wang
OffRL
GP
66
288
0
24 May 2019
Online Convex Optimization in Adversarial Markov Decision Processes
Aviv A. Rosenberg
Yishay Mansour
54
138
0
19 May 2019
Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP
Kefan Dong
Yuanhao Wang
Xiaoyu Chen
Liwei Wang
OffRL
65
96
0
27 Jan 2019
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette
Emma Brunskill
OffRL
115
276
0
01 Jan 2019
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
78
812
0
10 Jul 2018
Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning
Ronan Fruit
Matteo Pirotta
A. Lazaric
R. Ortner
89
117
0
12 Feb 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
535
19,265
0
20 Jul 2017
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
92
778
0
16 Mar 2017
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Christoph Dann
Emma Brunskill
74
249
0
29 Oct 2015
Explore no more: Improved high-probability regret bounds for non-stochastic bandits
Gergely Neu
402
185
0
10 Jun 2015
On the Sample Complexity of Reinforcement Learning with a Generative Model
M. G. Azar
Rémi Munos
H. Kappen
76
156
0
27 Jun 2012
Contextual Bandit Algorithms with Supervised Learning Guarantees
A. Beygelzimer
John Langford
Lihong Li
L. Reyzin
Robert Schapire
OffRL
199
326
0
22 Feb 2010
1