ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.02247
  4. Cited By
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

7 November 2016
S. Gu
Timothy Lillicrap
Zoubin Ghahramani
Richard Turner
Sergey Levine
    OffRL
    BDL
ArXivPDFHTML

Papers citing "Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic"

50 / 195 papers shown
Title
RLAP: A Reinforcement Learning Enhanced Adaptive Planning Framework for Multi-step NLP Task Solving
RLAP: A Reinforcement Learning Enhanced Adaptive Planning Framework for Multi-step NLP Task Solving
Zepeng Ding
Dixuan Wang
Ziqin Luo
Guochao Jiang
Deqing Yang
Jiaqing Liang
2
0
0
17 May 2025
Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning
Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning
Kalyan Cherukuri
Aarav Lala
Yash Yardi
2
0
0
17 May 2025
Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning
Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning
Yongshuai Liu
Xin Liu
93
1
0
26 Mar 2025
Multi-Fidelity Policy Gradient Algorithms
Multi-Fidelity Policy Gradient Algorithms
Xinjie Liu
Cyrus Neary
Kushagra Gupta
Christian Ellis
Ufuk Topcu
David Fridovich-Keil
OffRL
208
0
0
07 Mar 2025
How to Choose a Reinforcement-Learning Algorithm
How to Choose a Reinforcement-Learning Algorithm
Fabian Bongratz
Vladimir Golkov
Lukas Mautner
Luca Della Libera
Frederik Heetmeyer
Felix Czaja
Julian Rodemann
Daniel Cremers
34
1
0
30 Jul 2024
Disentangled Representations for Causal Cognition
Disentangled Representations for Causal Cognition
Filippo Torresan
Manuel Baltieri
CML
49
1
0
30 Jun 2024
Transformers and Slot Encoding for Sample Efficient Physical World
  Modelling
Transformers and Slot Encoding for Sample Efficient Physical World Modelling
Francesco Petri
Luigi Asprino
Aldo Gangemi
OCL
ViT
32
0
0
30 May 2024
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent
  Baseline
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline
Wenjia Meng
Qian Zheng
Long Yang
Yilong Yin
Gang Pan
OffRL
34
0
0
04 May 2024
S$^2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor
  Critic
S2^22AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic
Safa Messaoud
Billel Mokeddem
Zhenghai Xue
Linsey Pang
Bo An
Haipeng Chen
Sanjay Chawla
41
3
0
02 May 2024
Knowledge Transfer for Cross-Domain Reinforcement Learning: A Systematic
  Review
Knowledge Transfer for Cross-Domain Reinforcement Learning: A Systematic Review
Sergio A. Serrano
J. Martínez-Carranza
L. Sucar
36
0
0
26 Apr 2024
When Do Off-Policy and On-Policy Policy Gradient Methods Align?
When Do Off-Policy and On-Policy Policy Gradient Methods Align?
Davide Mambelli
Stephan Bongers
O. Zoeter
M. Spaan
F. Oliehoek
OffRL
21
0
0
19 Feb 2024
Multi-agent Reinforcement Learning: A Comprehensive Survey
Multi-agent Reinforcement Learning: A Comprehensive Survey
Dom Huh
Prasant Mohapatra
AI4CE
36
8
0
15 Dec 2023
Handling Cost and Constraints with Off-Policy Deep Reinforcement
  Learning
Handling Cost and Constraints with Off-Policy Deep Reinforcement Learning
Jared Markowitz
Jesse Silverberg
Gary Collins
OffRL
23
0
0
30 Nov 2023
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Yifei Zhou
Ayush Sekhari
Yuda Song
Wen Sun
OffRL
OnRL
30
8
0
14 Nov 2023
On-Policy Policy Gradient Reinforcement Learning Without On-Policy
  Sampling
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
Nicholas Corrado
Josiah P. Hanna
OffRL
20
1
0
14 Nov 2023
$\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program
  Synthesis
B\mathcal{B}B-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis
Zishun Yu
Yunzhe Tao
Liyu Chen
Tao Sun
Hongxia Yang
32
9
0
04 Oct 2023
Would I have gotten that reward? Long-term credit assignment by
  counterfactual contribution analysis
Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
Alexander Meulemans
Simon Schug
Seijin Kobayashi
Nathaniel D. Daw
Gregory Wayne
29
3
0
29 Jun 2023
Maximum Causal Entropy Inverse Constrained Reinforcement Learning
Maximum Causal Entropy Inverse Constrained Reinforcement Learning
Mattijs Baert
Pietro Mazzaglia
Sam Leroux
Pieter Simoens
CML
43
10
0
04 May 2023
Using Offline Data to Speed-up Reinforcement Learning in Procedurally
  Generated Environments
Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments
Alain Andres
Lukas Schafer
Esther Villar-Rodriguez
Stefano V. Albrecht
Javier Del Ser
OffRL
OnRL
34
2
0
18 Apr 2023
Foundation Models for Decision Making: Problems, Methods, and
  Opportunities
Foundation Models for Decision Making: Problems, Methods, and Opportunities
Sherry Yang
Ofir Nachum
Yilun Du
Jason W. Wei
Pieter Abbeel
Dale Schuurmans
LM&Ro
OffRL
LRM
AI4CE
98
156
0
07 Mar 2023
Sample Dropout: A Simple yet Effective Variance Reduction Technique in
  Deep Policy Optimization
Sample Dropout: A Simple yet Effective Variance Reduction Technique in Deep Policy Optimization
Zichuan Lin
Xiapeng Wu
Mingfei Sun
Deheng Ye
Qiang Fu
Wei Yang
Wei Liu
28
3
0
05 Feb 2023
Online Reinforcement Learning in Non-Stationary Context-Driven Environments
Online Reinforcement Learning in Non-Stationary Context-Driven Environments
Pouya Hamadanian
Arash Nasr-Esfahany
Malte Schwarzkopf
Siddartha Sen
MohammadIman Alizadeh
CLL
OffRL
50
0
0
04 Feb 2023
Distillation Policy Optimization
Distillation Policy Optimization
Jianfei Ma
OffRL
26
1
0
01 Feb 2023
Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees
Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees
Hsin-En Su
Yen-Ju Chen
Ping-Chun Hsieh
Xi Liu
OffRL
26
0
0
10 Dec 2022
On Many-Actions Policy Gradient
On Many-Actions Policy Gradient
Michal Nauman
Marek Cygan
19
0
0
24 Oct 2022
Taming Multi-Agent Reinforcement Learning with Estimator Variance
  Reduction
Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction
Taher Jafferjee
Juliusz Ziomek
Tianpei Yang
Zipeng Dai
Jianhong Wang
Matthew E. Taylor
Kun Shao
Jun Wang
D. Mguni
40
0
0
02 Sep 2022
Normality-Guided Distributional Reinforcement Learning for Continuous
  Control
Normality-Guided Distributional Reinforcement Learning for Continuous Control
Ju-Seung Byun
Andrew Perrault
OffRL
16
0
0
28 Aug 2022
Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking
Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking
R. EshwarS
Shishir Kolathaya
Gugan Thoppe
19
0
0
22 Aug 2022
Neural Set Function Extensions: Learning with Discrete Functions in High
  Dimensions
Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions
Nikolaos Karalias
Joshua Robinson
Andreas Loukas
Stefanie Jegelka
37
8
0
08 Aug 2022
Generalized Policy Improvement Algorithms with Theoretically Supported
  Sample Reuse
Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse
James Queeney
I. Paschalidis
Christos G. Cassandras
OffRL
32
2
0
28 Jun 2022
A Parametric Class of Approximate Gradient Updates for Policy
  Optimization
A Parametric Class of Approximate Gradient Updates for Policy Optimization
Ramki Gummadi
Saurabh Kumar
Junfeng Wen
Dale Schuurmans
26
0
0
17 Jun 2022
Variance Reduction for Policy-Gradient Methods via Empirical Variance
  Minimization
Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization
Maxim Kaledin
Alexander Golubev
Denis Belomestny
OffRL
26
3
0
14 Jun 2022
Simplifying Polylogarithms with Machine Learning
Simplifying Polylogarithms with Machine Learning
Aurélien Dersy
M. Schwartz
Xiao-Yan Zhang
AI4CE
21
16
0
08 Jun 2022
Disentangling Epistemic and Aleatoric Uncertainty in Reinforcement
  Learning
Disentangling Epistemic and Aleatoric Uncertainty in Reinforcement Learning
Bertrand Charpentier
Ransalu Senanayake
Mykel Kochenderfer
Stephan Günnemann
PER
UD
50
24
0
03 Jun 2022
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open
  Problems
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems
Rafael Figueiredo Prudencio
Marcos R. O. A. Máximo
Esther Luna Colombini
OffRL
26
222
0
02 Mar 2022
Online Decision Transformer
Online Decision Transformer
Qinqing Zheng
Amy Zhang
Aditya Grover
OffRL
18
204
0
11 Feb 2022
Exploration with Multi-Sample Target Values for Distributional
  Reinforcement Learning
Exploration with Multi-Sample Target Values for Distributional Reinforcement Learning
Michael Teng
M. van de Panne
Frank Wood
OOD
OffRL
14
1
0
06 Feb 2022
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted
  Iteration
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
OffRL
21
1
0
31 Jan 2022
Understanding the Effects of Second-Order Approximations in Natural
  Policy Gradient Reinforcement Learning
Understanding the Effects of Second-Order Approximations in Natural Policy Gradient Reinforcement Learning
Brennan Gebotys
Alexander Wong
David A Clausi
20
2
0
22 Jan 2022
Demystifying Reinforcement Learning in Time-Varying Systems
Demystifying Reinforcement Learning in Time-Varying Systems
Pouya Hamadanian
Malte Schwarzkopf
Siddartha Sen
MohammadIman Alizadeh
43
1
0
14 Jan 2022
Generalized Proximal Policy Optimization with Sample Reuse
Generalized Proximal Policy Optimization with Sample Reuse
James Queeney
I. Paschalidis
Christos G. Cassandras
OffRL
37
47
0
29 Oct 2021
Hierarchical Primitive Composition: Simultaneous Activation of Skills
  with Inconsistent Action Dimensions in Multiple Hierarchies
Hierarchical Primitive Composition: Simultaneous Activation of Skills with Inconsistent Action Dimensions in Multiple Hierarchies
Jeong-Hoon Lee
Jongeun Choi
26
8
0
05 Oct 2021
Federated Reinforcement Learning: Techniques, Applications, and Open
  Challenges
Federated Reinforcement Learning: Techniques, Applications, and Open Challenges
Jiaju Qi
Qihao Zhou
Lei Lei
Kan Zheng
FedML
31
145
0
26 Aug 2021
Settling the Variance of Multi-Agent Policy Gradients
Settling the Variance of Multi-Agent Policy Gradients
J. Kuba
Muning Wen
Yaodong Yang
Linghui Meng
Shangding Gu
Haifeng Zhang
D. Mguni
Jun Wang
24
58
0
19 Aug 2021
Optimal Actor-Critic Policy with Optimized Training Datasets
Optimal Actor-Critic Policy with Optimized Training Datasets
C. Banerjee
Zhiyong Chen
N. Noman
M. Zamani
OffRL
33
7
0
16 Aug 2021
Hindsight Value Function for Variance Reduction in Stochastic Dynamic
  Environment
Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment
Jiaming Guo
Rui Zhang
Xishan Zhang
Shaohui Peng
Qiaomin Yi
Zidong Du
Xing Hu
Qi Guo
Yunji Chen
11
7
0
26 Jul 2021
Coordinate-wise Control Variates for Deep Policy Gradients
Coordinate-wise Control Variates for Deep Policy Gradients
Yuanyi Zhong
Yuanshuo Zhou
Jian-wei Peng
BDL
24
1
0
11 Jul 2021
Survivable Robotic Control through Guided Bayesian Policy Search with
  Deep Reinforcement Learning
Survivable Robotic Control through Guided Bayesian Policy Search with Deep Reinforcement Learning
Sayyed Jaffar Ali Raza
Apan Dastider
Mingjie Lin
6
1
0
29 Jun 2021
Deep Reinforcement Learning for Conservation Decisions
Deep Reinforcement Learning for Conservation Decisions
Marcus Lapeyrolerie
Melissa S. Chapman
Kari E. A. Norman
C. Boettiger
OffRL
25
16
0
15 Jun 2021
A unified view of likelihood ratio and reparameterization gradients
A unified view of likelihood ratio and reparameterization gradients
Paavo Parmas
Masashi Sugiyama
20
9
0
31 May 2021
1234
Next