ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.01786
  4. Cited By
Global Optimality Guarantees For Policy Gradient Methods

Global Optimality Guarantees For Policy Gradient Methods

5 June 2019
Jalaj Bhandari
Daniel Russo
ArXivPDFHTML

Papers citing "Global Optimality Guarantees For Policy Gradient Methods"

50 / 59 papers shown
Title
Near-Optimal Sample Complexity for Iterated CVaR Reinforcement Learning with a Generative Model
Near-Optimal Sample Complexity for Iterated CVaR Reinforcement Learning with a Generative Model
Zilong Deng
Simon Khan
Shaofeng Zou
116
0
0
11 Mar 2025
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF
Flint Xiaofeng Fan
Cheston Tan
Yew-Soon Ong
Roger Wattenhofer
Wei Tsang Ooi
118
1
0
20 Dec 2024
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
Fu-Chieh Chang
Yu-Ting Lee
Hui-Ying Shih
Pei-Yuan Wu
Pei-Yuan Wu
OffRL
LRM
378
0
0
31 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
119
7
0
06 Oct 2024
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura
Tadashi Kozuno
Wataru Kumagai
Kenta Hoshino
Y. Hosoe
Kazumi Kasaura
Masashi Hamaya
Paavo Parmas
Yutaka Matsuo
89
2
0
29 Aug 2024
Functional Acceleration for Policy Mirror Descent
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
62
0
0
23 Jul 2024
Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence
Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence
Minheng Xiao
Xian Yu
Lei Ying
74
2
0
23 May 2024
Behind the Myth of Exploration in Policy Gradients
Behind the Myth of Exploration in Policy Gradients
Adrien Bolland
Gaspard Lambrechts
Damien Ernst
79
0
0
31 Jan 2024
On the Global Convergence of Risk-Averse Policy Gradient Methods with Expected Conditional Risk Measures
On the Global Convergence of Risk-Averse Policy Gradient Methods with Expected Conditional Risk Measures
Xian Yu
Lei Ying
52
5
0
26 Jan 2023
Finite Sample Analysis of Minimax Offline Reinforcement Learning:
  Completeness, Fast Rates and First-Order Efficiency
Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency
Masatoshi Uehara
Masaaki Imaizumi
Nan Jiang
Nathan Kallus
Wen Sun
Tengyang Xie
OffRL
41
53
0
05 Feb 2021
What are the Statistical Limits of Offline RL with Linear Function
  Approximation?
What are the Statistical Limits of Offline RL with Linear Function Approximation?
Ruosong Wang
Dean Phillips Foster
Sham Kakade
OffRL
149
163
0
22 Oct 2020
Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable
  Optimal Action-Value Functions
Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions
Gellert Weisz
Philip Amortila
Csaba Szepesvári
OffRL
147
80
0
03 Oct 2020
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
  Policies
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
Nathan Kallus
Masatoshi Uehara
OffRL
37
15
0
06 Jun 2020
The Ingredients of Real-World Robotic Reinforcement Learning
The Ingredients of Real-World Robotic Reinforcement Learning
Henry Zhu
Justin Yu
Abhishek Gupta
Dhruv Shah
Kristian Hartikainen
Avi Singh
Vikash Kumar
Sergey Levine
OffRL
100
176
0
27 Apr 2020
Statistically Efficient Off-Policy Policy Gradients
Statistically Efficient Off-Policy Policy Gradients
Nathan Kallus
Masatoshi Uehara
OffRL
64
39
0
10 Feb 2020
Is a Good Representation Sufficient for Sample Efficient Reinforcement
  Learning?
Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
S. Du
Sham Kakade
Ruosong Wang
Lin F. Yang
173
193
0
07 Oct 2019
Adaptive Trust Region Policy Optimization: Global Convergence and Faster
  Rates for Regularized MDPs
Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
Lior Shani
Yonathan Efroni
Shie Mannor
49
175
0
06 Sep 2019
Neural Policy Gradient Methods: Global Optimality and Rates of
  Convergence
Neural Policy Gradient Methods: Global Optimality and Rates of Convergence
Lingxiao Wang
Qi Cai
Zhuoran Yang
Zhaoran Wang
77
241
0
29 Aug 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
61
320
0
01 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function
  Approximation
Provably Efficient Reinforcement Learning with Linear Function Approximation
Chi Jin
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
86
556
0
11 Jul 2019
Monte Carlo Gradient Estimation in Machine Learning
Monte Carlo Gradient Estimation in Machine Learning
S. Mohamed
Mihaela Rosca
Michael Figurnov
A. Mnih
67
408
0
25 Jun 2019
Information-Theoretic Considerations in Batch Reinforcement Learning
Information-Theoretic Considerations in Batch Reinforcement Learning
Jinglin Chen
Nan Jiang
OOD
OffRL
147
376
0
01 May 2019
A Theory of Regularized Markov Decision Processes
A Theory of Regularized Markov Decision Processes
Matthieu Geist
B. Scherrer
Olivier Pietquin
109
325
0
31 Jan 2019
Soft Actor-Critic Algorithms and Applications
Soft Actor-Critic Algorithms and Applications
Tuomas Haarnoja
Aurick Zhou
Kristian Hartikainen
George Tucker
Sehoon Ha
...
Vikash Kumar
Henry Zhu
Abhishek Gupta
Pieter Abbeel
Sergey Levine
133
2,422
0
13 Dec 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
252
3,194
0
20 Jun 2018
Variational Inverse Control with Events: A General Framework for
  Data-Driven Reward Definition
Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition
Justin Fu
Avi Singh
Dibya Ghosh
Larry Yang
Sergey Levine
BDL
41
125
0
29 May 2018
Stochastic subgradient method converges on tame functions
Stochastic subgradient method converges on tame functions
Damek Davis
Dmitriy Drusvyatskiy
Sham Kakade
Jason D. Lee
56
251
0
20 Apr 2018
Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and
  Some New Implementations
Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations
Dimitri Bertsekas
OffRL
53
131
0
12 Apr 2018
On the Power of Over-parametrization in Neural Networks with Quadratic
  Activation
On the Power of Over-parametrization in Neural Networks with Quadratic Activation
S. Du
Jason D. Lee
160
271
0
03 Mar 2018
Global Convergence of Policy Gradient Methods for the Linear Quadratic
  Regulator
Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
Maryam Fazel
Rong Ge
Sham Kakade
M. Mesbahi
77
601
0
15 Jan 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
446
18,931
0
20 Jul 2017
Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex
  Problems
Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems
Damek Davis
Benjamin Grimmer
48
113
0
12 Jul 2017
Parameter Space Noise for Exploration
Parameter Space Noise for Exploration
Matthias Plappert
Rein Houthooft
Prafulla Dhariwal
Szymon Sidor
Richard Y. Chen
Xi Chen
Tamim Asfour
Pieter Abbeel
Marcin Andrychowicz
52
595
0
06 Jun 2017
Deep Exploration via Randomized Value Functions
Deep Exploration via Randomized Value Functions
Ian Osband
Benjamin Van Roy
Daniel Russo
Zheng Wen
89
306
0
22 Mar 2017
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Tim Salimans
Jonathan Ho
Xi Chen
Szymon Sidor
Ilya Sutskever
92
1,537
0
10 Mar 2017
Towards Generalization and Simplicity in Continuous Control
Towards Generalization and Simplicity in Continuous Control
Aravind Rajeswaran
Kendall Lowrey
E. Todorov
Sham Kakade
OffRL
86
276
0
08 Mar 2017
How to Escape Saddle Points Efficiently
How to Escape Saddle Points Efficiently
Chi Jin
Rong Ge
Praneeth Netrapalli
Sham Kakade
Michael I. Jordan
ODL
213
836
0
02 Mar 2017
Improving Policy Gradient by Exploring Under-appreciated Rewards
Improving Policy Gradient by Exploring Under-appreciated Rewards
Ofir Nachum
Mohammad Norouzi
Dale Schuurmans
66
44
0
28 Nov 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the
  Polyak-Łojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark Schmidt
262
1,218
0
16 Aug 2016
Stochastic Frank-Wolfe Methods for Nonconvex Optimization
Stochastic Frank-Wolfe Methods for Nonconvex Optimization
Sashank J. Reddi
S. Sra
Barnabás Póczós
Alex Smola
57
140
0
27 Jul 2016
Matrix Completion has No Spurious Local Minimum
Matrix Completion has No Spurious Local Minimum
Rong Ge
Jason D. Lee
Tengyu Ma
99
599
0
24 May 2016
Global Optimality of Local Search for Low Rank Matrix Recovery
Global Optimality of Local Search for Low Rank Matrix Recovery
Srinadh Bhojanapalli
Behnam Neyshabur
Nathan Srebro
ODL
100
388
0
23 May 2016
Deep Learning without Poor Local Minima
Deep Learning without Poor Local Minima
Kenji Kawaguchi
ODL
213
922
0
23 May 2016
Stochastic Variance Reduction for Nonconvex Optimization
Stochastic Variance Reduction for Nonconvex Optimization
Sashank J. Reddi
Ahmed S. Hefny
S. Sra
Barnabás Póczós
Alex Smola
92
600
0
19 Mar 2016
Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
191
8,833
0
04 Feb 2016
Complete Dictionary Recovery over the Sphere I: Overview and the
  Geometric Picture
Complete Dictionary Recovery over the Sphere I: Overview and the Geometric Picture
Ju Sun
Qing Qu
John N. Wright
114
159
0
11 Nov 2015
High-Dimensional Continuous Control Using Generalized Advantage
  Estimation
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
84
3,399
0
08 Jun 2015
Escaping From Saddle Points --- Online Stochastic Gradient for Tensor
  Decomposition
Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition
Rong Ge
Furong Huang
Chi Jin
Yang Yuan
135
1,058
0
06 Mar 2015
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
277
6,764
0
19 Feb 2015
On the Computational Efficiency of Training Neural Networks
On the Computational Efficiency of Training Neural Networks
Roi Livni
Shai Shalev-Shwartz
Ohad Shamir
139
479
0
05 Oct 2014
12
Next