Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.14309
Cited By
How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization
29 April 2020
P. DÓro
Wojciech Ja'skowski
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization"
7 / 7 papers shown
Title
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Michael Psenka
Alejandro Escontrela
Pieter Abbeel
Yi Ma
DiffM
93
24
0
17 Feb 2025
Compatible Gradient Approximations for Actor-Critic Algorithms
Baturay Saglam
Dionysis Kalogerias
31
0
0
02 Sep 2024
Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple Reuse
Jiafei Lyu
Le Wan
Zongqing Lu
Xiu Li
OffRL
34
9
0
29 May 2023
Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function
Ruijie Zheng
Xiyao Wang
Huazhe Xu
Furong Huang
48
13
0
02 Feb 2023
The Primacy Bias in Deep Reinforcement Learning
Evgenii Nikishin
Max Schwarzer
P. DÓro
Pierre-Luc Bacon
Rameswar Panda
OnRL
96
180
0
16 May 2022
A case for new neural network smoothness constraints
Mihaela Rosca
T. Weber
Arthur Gretton
S. Mohamed
AAML
35
48
0
14 Dec 2020
What About Inputing Policy in Value Function: Policy Representation and Policy-extended Value Function Approximator
Hongyao Tang
Zhaopeng Meng
Jianye Hao
Chong Chen
D. Graves
...
Hangyu Mao
Wulong Liu
Yaodong Yang
Wenyuan Tao
Li Wang
OffRL
14
7
0
19 Oct 2020
1