ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.06234
  4. Cited By
Optimization Issues in KL-Constrained Approximate Policy Iteration

Optimization Issues in KL-Constrained Approximate Policy Iteration

11 February 2021
N. Lazić
Botao Hao
Yasin Abbasi-Yadkori
Dale Schuurmans
Csaba Szepesvári
ArXiv (abs)PDFHTML

Papers citing "Optimization Issues in KL-Constrained Approximate Policy Iteration"

19 / 19 papers shown
Title
Mirror Descent Policy Optimization
Mirror Descent Policy Optimization
Manan Tomar
Lior Shani
Yonathan Efroni
Mohammad Ghavamzadeh
99
87
0
20 May 2020
On the Global Convergence Rates of Softmax Policy Gradient Methods
On the Global Convergence Rates of Softmax Policy Gradient Methods
Jincheng Mei
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
143
294
0
13 May 2020
Leverage the Average: an Analysis of KL Regularization in RL
Leverage the Average: an Analysis of KL Regularization in RL
Nino Vieillard
Tadashi Kozuno
B. Scherrer
Olivier Pietquin
Rémi Munos
Matthieu Geist
60
43
0
31 Mar 2020
Optimistic Policy Optimization with Bandit Feedback
Optimistic Policy Optimization with Bandit Feedback
Yonathan Efroni
Lior Shani
Aviv A. Rosenberg
Shie Mannor
56
90
0
19 Feb 2020
Adaptive Approximate Policy Iteration
Adaptive Approximate Policy Iteration
Botao Hao
N. Lazić
Yasin Abbasi-Yadkori
Pooria Joulani
Csaba Szepesvári
63
14
0
08 Feb 2020
Provably Efficient Exploration in Policy Optimization
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
66
283
0
12 Dec 2019
Momentum in Reinforcement Learning
Momentum in Reinforcement Learning
Nino Vieillard
B. Scherrer
Olivier Pietquin
Matthieu Geist
70
35
0
21 Oct 2019
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete
  and Continuous Control
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
H. F. Song
A. Abdolmaleki
Jost Tobias Springenberg
Aidan Clark
Hubert Soyer
...
Dhruva Tirumala
N. Heess
Dan Belov
Martin Riedmiller
M. Botvinick
88
125
0
26 Sep 2019
Adaptive Trust Region Policy Optimization: Global Convergence and Faster
  Rates for Regularized MDPs
Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
Lior Shani
Yonathan Efroni
Shie Mannor
55
176
0
06 Sep 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
69
321
0
01 Aug 2019
Understanding the impact of entropy on policy optimization
Understanding the impact of entropy on policy optimization
Zafarali Ahmed
Nicolas Le Roux
Mohammad Norouzi
Dale Schuurmans
73
237
0
27 Nov 2018
Maximum a Posteriori Policy Optimisation
Maximum a Posteriori Policy Optimisation
A. Abdolmaleki
Jost Tobias Springenberg
Yuval Tassa
Rémi Munos
N. Heess
Martin Riedmiller
73
478
0
14 Jun 2018
DeepMind Control Suite
DeepMind Control Suite
Yuval Tassa
Yotam Doron
Alistair Muldal
Tom Erez
Yazhe Li
...
A. Abdolmaleki
J. Merel
Andrew Lefrancq
Timothy Lillicrap
Martin Riedmiller
ELMLM&RoBDL
148
1,143
0
02 Jan 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
532
19,265
0
20 Jul 2017
Constrained Policy Optimization
Constrained Policy Optimization
Joshua Achiam
David Held
Aviv Tamar
Pieter Abbeel
126
1,331
0
30 May 2017
A unified view of entropy-regularized Markov decision processes
A unified view of entropy-regularized Markov decision processes
Gergely Neu
Anders Jonsson
Vicencc Gómez
101
264
0
22 May 2017
Reward Augmented Maximum Likelihood for Neural Structured Prediction
Reward Augmented Maximum Likelihood for Neural Structured Prediction
Mohammad Norouzi
Samy Bengio
Zhiwen Chen
Navdeep Jaitly
M. Schuster
Yonghui Wu
Dale Schuurmans
89
253
0
01 Sep 2016
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
277
6,796
0
19 Feb 2015
REGAL: A Regularization based Algorithm for Reinforcement Learning in
  Weakly Communicating MDPs
REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs
Peter L. Bartlett
Ambuj Tewari
93
286
0
09 May 2012
1