Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1611.02247
Cited By
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
7 November 2016
S. Gu
Timothy Lillicrap
Zoubin Ghahramani
Richard Turner
Sergey Levine
OffRL
BDL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic"
50 / 196 papers shown
Title
A unified view of likelihood ratio and reparameterization gradients
Paavo Parmas
Masashi Sugiyama
22
9
0
31 May 2021
Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning
Jie Ren
Yewen Li
Zihan Ding
Wei Pan
Hao Dong
BDL
MoE
21
25
0
19 Apr 2021
Bi-level Off-policy Reinforcement Learning for Volt/VAR Control Involving Continuous and Discrete Devices
Haotian Liu
Wenchuan Wu
OffRL
16
6
0
13 Apr 2021
Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning
Hiroki Furuta
Tadashi Kozuno
T. Matsushima
Y. Matsuo
S. Gu
18
14
0
31 Mar 2021
Learning Reactive and Predictive Differentiable Controllers for Switching Linear Dynamical Models
Saumya Saxena
A. LaGrassa
Oliver Kroemer
23
4
0
26 Mar 2021
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning
Hiroki Furuta
T. Matsushima
Tadashi Kozuno
Y. Matsuo
Sergey Levine
Ofir Nachum
S. Gu
OffRL
19
13
0
23 Mar 2021
Combining Off and On-Policy Training in Model-Based Reinforcement Learning
Alexandre Borges
Arlindo L. Oliveira
17
2
0
24 Feb 2021
On Proximal Policy Optimization's Heavy-tailed Gradients
Saurabh Garg
Joshua Zhanson
Emilio Parisotto
Adarsh Prasad
J. Zico Kolter
Zachary Chase Lipton
Sivaraman Balakrishnan
Ruslan Salakhutdinov
Pradeep Ravikumar
25
11
0
20 Feb 2021
Measuring Progress in Deep Reinforcement Learning Sample Efficiency
Florian E. Dorner
25
12
0
09 Feb 2021
Multi-hop RIS-Empowered Terahertz Communications: A DRL-based Hybrid Beamforming Design
Chongwen Huang
Zhaohui Yang
G. C. Alexandropoulos
Kai Xiong
Li Wei
Chau Yuen
Zhaoyang Zhang
Merouane Debbah
32
327
0
22 Jan 2021
Reinforcement Learning for Robust Missile Autopilot Design
Bernardo Cortez
11
2
0
26 Nov 2020
Critic PI2: Master Continuous Planning via Policy Improvement with Path Integrals and Deep Actor-Critic Reinforcement Learning
Jiajun Fan
He Ba
Xian Guo
Jianye Hao
OffRL
19
5
0
13 Nov 2020
Variance-Reduced Off-Policy Memory-Efficient Policy Search
Daoming Lyu
Qi Qi
Mohammad Ghavamzadeh
Hengshuai Yao
Tianbao Yang
Bo Liu
OffRL
20
7
0
14 Sep 2020
Extended Radial Basis Function Controller for Reinforcement Learning
Nicholas Capel
Naifu Zhang
6
1
0
12 Sep 2020
Visualizing the Loss Landscape of Actor Critic Methods with Applications in Inventory Optimization
Recep Yusuf Bekci
M. Gümüş
14
4
0
04 Sep 2020
Beyond variance reduction: Understanding the true impact of baselines on policy optimization
Wesley Chung
Valentin Thomas
Marlos C. Machado
Nicolas Le Roux
OffRL
19
22
0
31 Aug 2020
Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems
Vinicius G. Goecks
33
11
0
30 Aug 2020
Modular Transfer Learning with Transition Mismatch Compensation for Excessive Disturbance Rejection
Tianming Wang
Wenjie Lu
H. Yu
Dikai Liu
41
1
0
29 Jul 2020
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Seyed Kamyar Seyed Ghasemipour
Dale Schuurmans
S. Gu
OffRL
211
119
0
21 Jul 2020
Momentum-Based Policy Gradient Methods
Feihu Huang
Shangqian Gao
J. Pei
Heng-Chiao Huang
22
38
0
13 Jul 2020
Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation
Shiyang Yan
Yang Hua
N. Robertson
OffRL
13
0
0
21 Jun 2020
Zeroth-Order Supervised Policy Improvement
Hao Sun
Ziping Xu
Yuhang Song
Meng Fang
Jiechao Xiong
Bo Dai
Bolei Zhou
OffRL
14
9
0
11 Jun 2020
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
T. Matsushima
Hiroki Furuta
Y. Matsuo
Ofir Nachum
S. Gu
OffRL
25
146
0
05 Jun 2020
Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration
Seungyul Han
Y. Sung
14
24
0
02 Jun 2020
MOPO: Model-based Offline Policy Optimization
Tianhe Yu
G. Thomas
Lantao Yu
Stefano Ermon
James Zou
Sergey Levine
Chelsea Finn
Tengyu Ma
OffRL
27
754
0
27 May 2020
Transferable Active Grasping and Real Embodied Dataset
Xiangyu Chen
Zelin Ye
Jiankai Sun
Yuda Fan
Fangwei Hu
Chenxi Wang
Cewu Lu
22
19
0
28 Apr 2020
Can We Learn Heuristics For Graphical Model Inference Using Reinforcement Learning?
Safa Messaoud
Maghav Kumar
A. Schwing
30
5
0
27 Apr 2020
Accelerating Deep Reinforcement Learning With the Aid of Partial Model: Energy-Efficient Predictive Video Streaming
Dong Liu
Jianyu Zhao
Chenyang Yang
L. Hanzo
22
1
0
21 Mar 2020
Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration
Guy Van den Broeck
Yitao Liang
Mathias Niepert
OffRL
19
3
0
25 Feb 2020
On the Search for Feedback in Reinforcement Learning
Ran A. Wang
Karthikeya S. Parunandi
Aayushman Sharma
R. Goyal
S. Chakravorty
11
9
0
21 Feb 2020
Adaptive Experience Selection for Policy Gradient
S. Mohamad
Giovanni Montana
39
0
0
17 Feb 2020
Discrete Action On-Policy Learning with Action-Value Critic
Yuguang Yue
Yunhao Tang
Mingzhang Yin
Mingyuan Yin
OffRL
14
5
0
10 Feb 2020
Sample-based Distributional Policy Gradient
Rahul Singh
Keuntaek Lee
Yongxin Chen
18
19
0
08 Jan 2020
Soft Q Network
Jingbin Liu
Shuai Liu
Xinyang Gu
OffRL
22
2
0
20 Dec 2019
Policy Optimization Reinforcement Learning with Entropy Regularization
Jingbin Liu
Xinyang Gu
Shuai Liu
22
4
0
02 Dec 2019
IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks
Michael Luo
Jiahao Yao
Richard Liaw
Eric Liang
Ion Stoica
22
15
0
30 Nov 2019
Merging Deterministic Policy Gradient Estimations with Varied Bias-Variance Tradeoff for Effective Deep Reinforcement Learning
Gang Chen
25
4
0
24 Nov 2019
Multi-Path Policy Optimization
L. Pan
Qingpeng Cai
Longbo Huang
18
2
0
11 Nov 2019
A2: Extracting Cyclic Switchings from DOB-nets for Rejecting Excessive Disturbances
Wenjie Lu
Dikai Liu
8
0
0
01 Nov 2019
Better Exploration with Optimistic Actor-Critic
K. Ciosek
Q. Vuong
R. Loftin
Katja Hofmann
29
149
0
28 Oct 2019
From Importance Sampling to Doubly Robust Policy Gradient
Jiawei Huang
Nan Jiang
OffRL
30
24
0
20 Oct 2019
A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme
Paavo Parmas
Masashi Sugiyama
19
3
0
14 Oct 2019
Segregation Dynamics with Reinforcement Learning and Agent Based Modeling
Egemen Sert
Y. Bar-Yam
A. Morales
16
39
0
18 Sep 2019
Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning
Wenjie Shi
Shiji Song
Hui Wu
Yachu Hsu
Cheng Wu
Gao Huang
11
25
0
07 Sep 2019
Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning
Wenjie Shi
Shiji Song
Cheng Wu
25
36
0
07 Sep 2019
Reinforcement learning with world model
Jingbin Liu
Xinyang Gu
Shuai Liu
24
0
0
30 Aug 2019
Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods
Ching-An Cheng
Xinyan Yan
Byron Boots
25
22
0
08 Aug 2019
Hindsight Trust Region Policy Optimization
Hanbo Zhang
Site Bai
Xuguang Lan
David Hsu
Nanning Zheng
38
8
0
29 Jul 2019
Deep Reinforcement Learning for Autonomous Internet of Things: Model, Applications and Challenges
Lei Lei
Yue Tan
Kan Zheng
Shiwen Liu
K. Zheng
Xuemin Shen
Shen
OffRL
21
202
0
22 Jul 2019
Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling
Yuping Luo
Huazhe Xu
Tengyu Ma
SSL
26
13
0
12 Jul 2019
Previous
1
2
3
4
Next