ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.02247
  4. Cited By
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

7 November 2016
S. Gu
Timothy Lillicrap
Zoubin Ghahramani
Richard Turner
Sergey Levine
    OffRL
    BDL
ArXivPDFHTML

Papers citing "Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic"

50 / 196 papers shown
Title
A unified view of likelihood ratio and reparameterization gradients
A unified view of likelihood ratio and reparameterization gradients
Paavo Parmas
Masashi Sugiyama
22
9
0
31 May 2021
Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement
  Learning
Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning
Jie Ren
Yewen Li
Zihan Ding
Wei Pan
Hao Dong
BDL
MoE
21
25
0
19 Apr 2021
Bi-level Off-policy Reinforcement Learning for Volt/VAR Control
  Involving Continuous and Discrete Devices
Bi-level Off-policy Reinforcement Learning for Volt/VAR Control Involving Continuous and Discrete Devices
Haotian Liu
Wenchuan Wu
OffRL
16
6
0
13 Apr 2021
Co-Adaptation of Algorithmic and Implementational Innovations in
  Inference-based Deep Reinforcement Learning
Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning
Hiroki Furuta
Tadashi Kozuno
T. Matsushima
Y. Matsuo
S. Gu
18
14
0
31 Mar 2021
Learning Reactive and Predictive Differentiable Controllers for
  Switching Linear Dynamical Models
Learning Reactive and Predictive Differentiable Controllers for Switching Linear Dynamical Models
Saumya Saxena
A. LaGrassa
Oliver Kroemer
23
4
0
26 Mar 2021
Policy Information Capacity: Information-Theoretic Measure for Task
  Complexity in Deep Reinforcement Learning
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning
Hiroki Furuta
T. Matsushima
Tadashi Kozuno
Y. Matsuo
Sergey Levine
Ofir Nachum
S. Gu
OffRL
19
13
0
23 Mar 2021
Combining Off and On-Policy Training in Model-Based Reinforcement
  Learning
Combining Off and On-Policy Training in Model-Based Reinforcement Learning
Alexandre Borges
Arlindo L. Oliveira
17
2
0
24 Feb 2021
On Proximal Policy Optimization's Heavy-tailed Gradients
On Proximal Policy Optimization's Heavy-tailed Gradients
Saurabh Garg
Joshua Zhanson
Emilio Parisotto
Adarsh Prasad
J. Zico Kolter
Zachary Chase Lipton
Sivaraman Balakrishnan
Ruslan Salakhutdinov
Pradeep Ravikumar
25
11
0
20 Feb 2021
Measuring Progress in Deep Reinforcement Learning Sample Efficiency
Measuring Progress in Deep Reinforcement Learning Sample Efficiency
Florian E. Dorner
25
12
0
09 Feb 2021
Multi-hop RIS-Empowered Terahertz Communications: A DRL-based Hybrid
  Beamforming Design
Multi-hop RIS-Empowered Terahertz Communications: A DRL-based Hybrid Beamforming Design
Chongwen Huang
Zhaohui Yang
G. C. Alexandropoulos
Kai Xiong
Li Wei
Chau Yuen
Zhaoyang Zhang
Merouane Debbah
32
327
0
22 Jan 2021
Reinforcement Learning for Robust Missile Autopilot Design
Reinforcement Learning for Robust Missile Autopilot Design
Bernardo Cortez
11
2
0
26 Nov 2020
Critic PI2: Master Continuous Planning via Policy Improvement with Path
  Integrals and Deep Actor-Critic Reinforcement Learning
Critic PI2: Master Continuous Planning via Policy Improvement with Path Integrals and Deep Actor-Critic Reinforcement Learning
Jiajun Fan
He Ba
Xian Guo
Jianye Hao
OffRL
19
5
0
13 Nov 2020
Variance-Reduced Off-Policy Memory-Efficient Policy Search
Variance-Reduced Off-Policy Memory-Efficient Policy Search
Daoming Lyu
Qi Qi
Mohammad Ghavamzadeh
Hengshuai Yao
Tianbao Yang
Bo Liu
OffRL
20
7
0
14 Sep 2020
Extended Radial Basis Function Controller for Reinforcement Learning
Extended Radial Basis Function Controller for Reinforcement Learning
Nicholas Capel
Naifu Zhang
6
1
0
12 Sep 2020
Visualizing the Loss Landscape of Actor Critic Methods with Applications
  in Inventory Optimization
Visualizing the Loss Landscape of Actor Critic Methods with Applications in Inventory Optimization
Recep Yusuf Bekci
M. Gümüş
14
4
0
04 Sep 2020
Beyond variance reduction: Understanding the true impact of baselines on
  policy optimization
Beyond variance reduction: Understanding the true impact of baselines on policy optimization
Wesley Chung
Valentin Thomas
Marlos C. Machado
Nicolas Le Roux
OffRL
19
22
0
31 Aug 2020
Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning
  Systems
Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems
Vinicius G. Goecks
33
11
0
30 Aug 2020
Modular Transfer Learning with Transition Mismatch Compensation for
  Excessive Disturbance Rejection
Modular Transfer Learning with Transition Mismatch Compensation for Excessive Disturbance Rejection
Tianming Wang
Wenjie Lu
H. Yu
Dikai Liu
41
1
0
29 Jul 2020
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline
  and Online RL
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Seyed Kamyar Seyed Ghasemipour
Dale Schuurmans
S. Gu
OffRL
211
119
0
21 Jul 2020
Momentum-Based Policy Gradient Methods
Momentum-Based Policy Gradient Methods
Feihu Huang
Shangqian Gao
J. Pei
Heng-Chiao Huang
22
38
0
13 Jul 2020
Off-Policy Self-Critical Training for Transformer in Visual Paragraph
  Generation
Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation
Shiyang Yan
Yang Hua
N. Robertson
OffRL
13
0
0
21 Jun 2020
Zeroth-Order Supervised Policy Improvement
Zeroth-Order Supervised Policy Improvement
Hao Sun
Ziping Xu
Yuhang Song
Meng Fang
Jiechao Xiong
Bo Dai
Bolei Zhou
OffRL
14
9
0
11 Jun 2020
Deployment-Efficient Reinforcement Learning via Model-Based Offline
  Optimization
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
T. Matsushima
Hiroki Furuta
Y. Matsuo
Ofir Nachum
S. Gu
OffRL
25
146
0
05 Jun 2020
Diversity Actor-Critic: Sample-Aware Entropy Regularization for
  Sample-Efficient Exploration
Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration
Seungyul Han
Y. Sung
14
24
0
02 Jun 2020
MOPO: Model-based Offline Policy Optimization
MOPO: Model-based Offline Policy Optimization
Tianhe Yu
G. Thomas
Lantao Yu
Stefano Ermon
James Zou
Sergey Levine
Chelsea Finn
Tengyu Ma
OffRL
27
754
0
27 May 2020
Transferable Active Grasping and Real Embodied Dataset
Transferable Active Grasping and Real Embodied Dataset
Xiangyu Chen
Zelin Ye
Jiankai Sun
Yuda Fan
Fangwei Hu
Chenxi Wang
Cewu Lu
22
19
0
28 Apr 2020
Can We Learn Heuristics For Graphical Model Inference Using
  Reinforcement Learning?
Can We Learn Heuristics For Graphical Model Inference Using Reinforcement Learning?
Safa Messaoud
Maghav Kumar
A. Schwing
30
5
0
27 Apr 2020
Accelerating Deep Reinforcement Learning With the Aid of Partial Model:
  Energy-Efficient Predictive Video Streaming
Accelerating Deep Reinforcement Learning With the Aid of Partial Model: Energy-Efficient Predictive Video Streaming
Dong Liu
Jianyu Zhao
Chenyang Yang
L. Hanzo
22
1
0
21 Mar 2020
Off-Policy Deep Reinforcement Learning with Analogous Disentangled
  Exploration
Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration
Guy Van den Broeck
Yitao Liang
Mathias Niepert
OffRL
19
3
0
25 Feb 2020
On the Search for Feedback in Reinforcement Learning
On the Search for Feedback in Reinforcement Learning
Ran A. Wang
Karthikeya S. Parunandi
Aayushman Sharma
R. Goyal
S. Chakravorty
11
9
0
21 Feb 2020
Adaptive Experience Selection for Policy Gradient
Adaptive Experience Selection for Policy Gradient
S. Mohamad
Giovanni Montana
39
0
0
17 Feb 2020
Discrete Action On-Policy Learning with Action-Value Critic
Discrete Action On-Policy Learning with Action-Value Critic
Yuguang Yue
Yunhao Tang
Mingzhang Yin
Mingyuan Yin
OffRL
14
5
0
10 Feb 2020
Sample-based Distributional Policy Gradient
Sample-based Distributional Policy Gradient
Rahul Singh
Keuntaek Lee
Yongxin Chen
18
19
0
08 Jan 2020
Soft Q Network
Soft Q Network
Jingbin Liu
Shuai Liu
Xinyang Gu
OffRL
22
2
0
20 Dec 2019
Policy Optimization Reinforcement Learning with Entropy Regularization
Policy Optimization Reinforcement Learning with Entropy Regularization
Jingbin Liu
Xinyang Gu
Shuai Liu
22
4
0
02 Dec 2019
IMPACT: Importance Weighted Asynchronous Architectures with Clipped
  Target Networks
IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks
Michael Luo
Jiahao Yao
Richard Liaw
Eric Liang
Ion Stoica
22
15
0
30 Nov 2019
Merging Deterministic Policy Gradient Estimations with Varied
  Bias-Variance Tradeoff for Effective Deep Reinforcement Learning
Merging Deterministic Policy Gradient Estimations with Varied Bias-Variance Tradeoff for Effective Deep Reinforcement Learning
Gang Chen
25
4
0
24 Nov 2019
Multi-Path Policy Optimization
Multi-Path Policy Optimization
L. Pan
Qingpeng Cai
Longbo Huang
18
2
0
11 Nov 2019
A2: Extracting Cyclic Switchings from DOB-nets for Rejecting Excessive
  Disturbances
A2: Extracting Cyclic Switchings from DOB-nets for Rejecting Excessive Disturbances
Wenjie Lu
Dikai Liu
8
0
0
01 Nov 2019
Better Exploration with Optimistic Actor-Critic
Better Exploration with Optimistic Actor-Critic
K. Ciosek
Q. Vuong
R. Loftin
Katja Hofmann
29
149
0
28 Oct 2019
From Importance Sampling to Doubly Robust Policy Gradient
From Importance Sampling to Doubly Robust Policy Gradient
Jiawei Huang
Nan Jiang
OffRL
30
24
0
20 Oct 2019
A unified view of likelihood ratio and reparameterization gradients and
  an optimal importance sampling scheme
A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme
Paavo Parmas
Masashi Sugiyama
19
3
0
14 Oct 2019
Segregation Dynamics with Reinforcement Learning and Agent Based
  Modeling
Segregation Dynamics with Reinforcement Learning and Agent Based Modeling
Egemen Sert
Y. Bar-Yam
A. Morales
16
39
0
18 Sep 2019
Regularized Anderson Acceleration for Off-Policy Deep Reinforcement
  Learning
Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning
Wenjie Shi
Shiji Song
Hui Wu
Yachu Hsu
Cheng Wu
Gao Huang
11
25
0
07 Sep 2019
Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement
  Learning
Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning
Wenjie Shi
Shiji Song
Cheng Wu
25
36
0
07 Sep 2019
Reinforcement learning with world model
Reinforcement learning with world model
Jingbin Liu
Xinyang Gu
Shuai Liu
24
0
0
30 Aug 2019
Trajectory-wise Control Variates for Variance Reduction in Policy
  Gradient Methods
Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods
Ching-An Cheng
Xinyan Yan
Byron Boots
25
22
0
08 Aug 2019
Hindsight Trust Region Policy Optimization
Hindsight Trust Region Policy Optimization
Hanbo Zhang
Site Bai
Xuguang Lan
David Hsu
Nanning Zheng
38
8
0
29 Jul 2019
Deep Reinforcement Learning for Autonomous Internet of Things: Model,
  Applications and Challenges
Deep Reinforcement Learning for Autonomous Internet of Things: Model, Applications and Challenges
Lei Lei
Yue Tan
Kan Zheng
Shiwen Liu
K. Zheng
Xuemin Shen
Shen
OffRL
21
202
0
22 Jul 2019
Learning Self-Correctable Policies and Value Functions from
  Demonstrations with Negative Sampling
Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling
Yuping Luo
Huazhe Xu
Tengyu Ma
SSL
26
13
0
12 Jul 2019
Previous
1234
Next