Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.08442
Cited By
v1
v2 (latest)
Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning
20 January 2023
Haoxuan Pan
Deheng Ye
Xiaoming Duan
Qiang Fu
Wei Yang
Jianping He
Mingfei Sun
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning"
22 / 22 papers shown
Title
You May Not Need Ratio Clipping in PPO
Mingfei Sun
Vitaly Kurin
Guoqing Liu
Sam Devlin
Tao Qin
Katja Hofmann
Shimon Whiteson
54
16
0
31 Jan 2022
Modeling Strong and Human-Like Gameplay with KL-Regularized Search
Athul Paul Jacob
David J. Wu
Gabriele Farina
Adam Lerer
Hengyuan Hu
A. Bakhtin
Jacob Andreas
Noam Brown
52
54
0
14 Dec 2021
Representation Learning for Online and Offline RL in Low-rank MDPs
Masatoshi Uehara
Xuezhou Zhang
Wen Sun
OffRL
126
129
0
09 Oct 2021
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning
J. Kuba
Ruiqing Chen
Munning Wen
Ying Wen
Fanglei Sun
Jun Wang
Yaodong Yang
112
245
0
23 Sep 2021
Muesli: Combining Improvements in Policy Optimization
Matteo Hessel
Ivo Danihelka
Fabio Viola
A. Guez
Simon Schmitt
Laurent Sifre
T. Weber
David Silver
H. V. Hasselt
86
66
0
13 Apr 2021
Return-Based Contrastive Representation Learning for Reinforcement Learning
Guoqing Liu
Wei Shen
Li Zhao
Tao Qin
Jinhua Zhu
Jian Li
Nenghai Yu
Tie-Yan Liu
SSL
OffRL
88
48
0
22 Feb 2021
Optimization Issues in KL-Constrained Approximate Policy Iteration
N. Lazić
Botao Hao
Yasin Abbasi-Yadkori
Dale Schuurmans
Csaba Szepesvári
44
11
0
11 Feb 2021
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal
Marlos C. Machado
Pablo Samuel Castro
Marc G. Bellemare
OffRL
98
168
0
13 Jan 2021
Phasic Policy Gradient
K. Cobbe
Jacob Hilton
Oleg Klimov
John Schulman
OffRL
62
159
0
09 Sep 2020
Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
Lior Shani
Yonathan Efroni
Shie Mannor
57
176
0
06 Sep 2019
rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch
Adam Stooke
Pieter Abbeel
OffRL
77
98
0
03 Sep 2019
Improving Deep Reinforcement Learning in Minecraft with Action Advice
Spencer Frazier
Mark O. Riedl
73
29
0
02 Aug 2019
Is the Policy Gradient a Gradient?
Chris Nota
Philip S. Thomas
78
58
0
17 Jun 2019
P3O: Policy-on Policy-off Policy Optimization
Rasool Fakoor
Pratik Chaudhari
Alex Smola
OffRL
66
55
0
05 May 2019
Relative Entropy Regularized Policy Iteration
A. Abdolmaleki
Jost Tobias Springenberg
Jonas Degrave
Steven Bohez
Yuval Tassa
Dan Belov
N. Heess
Martin Riedmiller
65
72
0
05 Dec 2018
A Sufficient Condition for Convergences of Adam and RMSProp
Fangyu Zou
Li Shen
Zequn Jie
Weizhong Zhang
Wei Liu
63
372
0
23 Nov 2018
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto
H. V. Hoof
David Meger
OffRL
189
5,218
0
26 Feb 2018
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
264
1,901
0
28 Dec 2017
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
74
107
0
06 Jul 2017
Generative Adversarial Imitation Learning
Jonathan Ho
Stefano Ermon
GAN
159
3,125
0
10 Jun 2016
OpenAI Gym
Greg Brockman
Vicki Cheung
Ludwig Pettersson
Jonas Schneider
John Schulman
Jie Tang
Wojciech Zaremba
OffRL
ODL
223
5,087
0
05 Jun 2016
ADADELTA: An Adaptive Learning Rate Method
Matthew D. Zeiler
ODL
165
6,635
0
22 Dec 2012
1