ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.07353
  4. Cited By
On the Sample Complexity of Reinforcement Learning with Policy Space
  Generalization

On the Sample Complexity of Reinforcement Learning with Policy Space Generalization

17 August 2020
Wenlong Mou
Zheng Wen
Xi Chen
ArXivPDFHTML

Papers citing "On the Sample Complexity of Reinforcement Learning with Policy Space Generalization"

24 / 24 papers shown
Title
Agnostic Q-learning with Function Approximation in Deterministic
  Systems: Tight Bounds on Approximation Error and Sample Complexity
Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity
S. Du
Jason D. Lee
G. Mahajan
Ruosong Wang
35
37
0
17 Feb 2020
Provably Efficient Exploration in Policy Optimization
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
39
278
0
12 Dec 2019
Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map
  Them to Actions
Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions
J. Schmidhuber
33
128
0
05 Dec 2019
Is a Good Representation Sufficient for Sample Efficient Reinforcement
  Learning?
Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
S. Du
Sham Kakade
Ruosong Wang
Lin F. Yang
112
192
0
07 Oct 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
44
320
0
01 Aug 2019
Worst-Case Regret Bounds for Exploration via Randomized Value Functions
Worst-Case Regret Bounds for Exploration via Randomized Value Functions
Daniel Russo
OffRL
30
82
0
07 Jun 2019
Global Optimality Guarantees For Policy Gradient Methods
Global Optimality Guarantees For Policy Gradient Methods
Jalaj Bhandari
Daniel Russo
60
188
0
05 Jun 2019
Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs
Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs
Max Simchowitz
Kevin Jamieson
52
144
0
09 May 2019
Derivative-Free Methods for Policy Optimization: Guarantees for Linear
  Quadratic Systems
Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems
Dhruv Malik
A. Pananjady
Kush S. Bhatia
K. Khamaru
Peter L. Bartlett
Martin J. Wainwright
45
198
0
20 Dec 2018
Information-Directed Exploration for Deep Reinforcement Learning
Information-Directed Exploration for Deep Reinforcement Learning
Nikolay Nikolov
Johannes Kirschner
Felix Berkenkamp
Andreas Krause
43
69
0
18 Dec 2018
Private PAC learning implies finite Littlestone dimension
Private PAC learning implies finite Littlestone dimension
N. Alon
Roi Livni
M. Malliaris
Shay Moran
38
109
0
04 Jun 2018
Global Convergence of Policy Gradient Methods for the Linear Quadratic
  Regulator
Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
Maryam Fazel
Rong Ge
Sham Kakade
M. Mesbahi
69
599
0
15 Jan 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
241
18,685
0
20 Jul 2017
Noisy Networks for Exploration
Noisy Networks for Exploration
Meire Fortunato
M. G. Azar
Bilal Piot
Jacob Menick
Ian Osband
...
Rémi Munos
Demis Hassabis
Olivier Pietquin
Charles Blundell
Shane Legg
68
890
0
30 Jun 2017
Deep Exploration via Randomized Value Functions
Deep Exploration via Randomized Value Functions
Ian Osband
Benjamin Van Roy
Daniel Russo
Zheng Wen
71
302
0
22 Mar 2017
Minimax Regret Bounds for Reinforcement Learning
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
68
771
0
16 Mar 2017
Contextual Decision Processes with Low Bellman Rank are PAC-Learnable
Contextual Decision Processes with Low Bellman Rank are PAC-Learnable
Nan Jiang
A. Krishnamurthy
Alekh Agarwal
John Langford
Robert Schapire
90
417
0
29 Oct 2016
Why is Posterior Sampling Better than Optimism for Reinforcement
  Learning?
Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
Ian Osband
Benjamin Van Roy
BDL
74
257
0
01 Jul 2016
Continuous control with deep reinforcement learning
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
207
13,174
0
09 Sep 2015
End-to-End Training of Deep Visuomotor Policies
End-to-End Training of Deep Visuomotor Policies
Sergey Levine
Chelsea Finn
Trevor Darrell
Pieter Abbeel
BDL
235
3,418
0
02 Apr 2015
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
245
6,722
0
19 Feb 2015
Model-based Reinforcement Learning and the Eluder Dimension
Model-based Reinforcement Learning and the Eluder Dimension
Ian Osband
Benjamin Van Roy
65
188
0
07 Jun 2014
Efficient Reinforcement Learning in Deterministic Systems with Value
  Function Generalization
Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization
Zheng Wen
Benjamin Van Roy
43
42
0
18 Jul 2013
Learning to Optimize Via Posterior Sampling
Learning to Optimize Via Posterior Sampling
Daniel Russo
Benjamin Van Roy
137
699
0
11 Jan 2013
1