ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.08346
  4. Cited By
Policy Optimization in Adversarial MDPs: Improved Exploration via
  Dilated Bonuses

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

18 July 2021
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
ArXiv (abs)PDFHTML

Papers citing "Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses"

33 / 33 papers shown
Title
Online Episodic Convex Reinforcement Learning
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
184
0
0
12 May 2025
Decision Making in Hybrid Environments: A Model Aggregation Approach
Decision Making in Hybrid Environments: A Model Aggregation Approach
Haolin Liu
Chen-Yu Wei
Julian Zimmert
230
0
0
09 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
156
45
0
31 Dec 2024
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
Masahiro Kato
Shinji Ito
139
0
0
05 Mar 2024
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
  Contextual Bandits and Markov Decision Processes
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
Chen Ye
Wei Xiong
Quanquan Gu
Tong Zhang
166
31
0
12 Dec 2022
Cautiously Optimistic Policy Optimization and Exploration with Linear
  Function Approximation
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
Andrea Zanette
Ching-An Cheng
Alekh Agarwal
91
53
0
24 Mar 2021
Improved Regret Bound and Experience Replay in Regularized Policy
  Iteration
Improved Regret Bound and Experience Replay in Regularized Policy Iteration
N. Lazić
Dong Yin
Yasin Abbasi-Yadkori
Csaba Szepesvári
OffRL
44
18
0
25 Feb 2021
Finding the Stochastic Shortest Path with Low Regret: The Adversarial
  Cost and Unknown Transition Case
Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case
Liyu Chen
Haipeng Luo
72
31
0
10 Feb 2021
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and
  Known Transition
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
Liyu Chen
Haipeng Luo
Chen-Yu Wei
73
32
0
07 Dec 2020
Learning Infinite-horizon Average-reward MDPs with Linear Function
  Approximation
Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Rahul Jain
70
43
0
23 Jul 2020
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient
  Learning
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
Alekh Agarwal
Mikael Henaff
Sham Kakade
Wen Sun
OffRL
70
110
0
16 Jul 2020
Online learning in MDPs with linear function approximation and bandit
  feedback
Online learning in MDPs with linear function approximation and bandit feedback
Gergely Neu
Julia Olkhovskaya
49
32
0
03 Jul 2020
On Reward-Free Reinforcement Learning with Linear Function Approximation
On Reward-Free Reinforcement Learning with Linear Function Approximation
Ruosong Wang
S. Du
Lin F. Yang
Ruslan Salakhutdinov
OffRL
73
107
0
19 Jun 2020
Bias no more: high-probability data-dependent regret bounds for
  adversarial bandits and MDPs
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
Chung-Wei Lee
Haipeng Luo
Chen-Yu Wei
Mengxiao Zhang
175
53
0
14 Jun 2020
Optimistic Policy Optimization with Bandit Feedback
Optimistic Policy Optimization with Bandit Feedback
Yonathan Efroni
Lior Shani
Aviv A. Rosenberg
Shie Mannor
56
90
0
19 Feb 2020
Provably Efficient Exploration in Policy Optimization
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
66
283
0
12 Dec 2019
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Chi Jin
Tiancheng Jin
Haipeng Luo
S. Sra
Tiancheng Yu
75
104
0
03 Dec 2019
Frequentist Regret Bounds for Randomized Least-Squares Value Iteration
Frequentist Regret Bounds for Randomized Least-Squares Value Iteration
Andrea Zanette
David Brandfonbrener
Emma Brunskill
Matteo Pirotta
A. Lazaric
78
132
0
01 Nov 2019
Model-free Reinforcement Learning in Infinite-horizon Average-reward
  Markov Decision Processes
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Hiteshi Sharma
R. Jain
136
108
0
15 Oct 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
69
321
0
01 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function
  Approximation
Provably Efficient Reinforcement Learning with Linear Function Approximation
Chi Jin
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
98
560
0
11 Jul 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and
  Regret Bound
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
Lin F. Yang
Mengdi Wang
OffRLGP
66
288
0
24 May 2019
Online Convex Optimization in Adversarial Markov Decision Processes
Online Convex Optimization in Adversarial Markov Decision Processes
Aviv A. Rosenberg
Yishay Mansour
54
138
0
19 May 2019
Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon
  MDP
Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP
Kefan Dong
Yuanhao Wang
Xiaoyu Chen
Liwei Wang
OffRL
65
96
0
27 Jan 2019
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning
  without Domain Knowledge using Value Function Bounds
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette
Emma Brunskill
OffRL
115
276
0
01 Jan 2019
Is Q-learning Provably Efficient?
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
78
812
0
10 Jul 2018
Efficient Bias-Span-Constrained Exploration-Exploitation in
  Reinforcement Learning
Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning
Ronan Fruit
Matteo Pirotta
A. Lazaric
R. Ortner
89
117
0
12 Feb 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
535
19,265
0
20 Jul 2017
Minimax Regret Bounds for Reinforcement Learning
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
92
778
0
16 Mar 2017
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Christoph Dann
Emma Brunskill
74
249
0
29 Oct 2015
Explore no more: Improved high-probability regret bounds for
  non-stochastic bandits
Explore no more: Improved high-probability regret bounds for non-stochastic bandits
Gergely Neu
402
185
0
10 Jun 2015
On the Sample Complexity of Reinforcement Learning with a Generative
  Model
On the Sample Complexity of Reinforcement Learning with a Generative Model
M. G. Azar
Rémi Munos
H. Kappen
76
156
0
27 Jun 2012
Contextual Bandit Algorithms with Supervised Learning Guarantees
Contextual Bandit Algorithms with Supervised Learning Guarantees
A. Beygelzimer
John Langford
Lihong Li
L. Reyzin
Robert Schapire
OffRL
199
326
0
22 Feb 2010
1