ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.02074
  4. Cited By
AlgaeDICE: Policy Gradient from Arbitrary Experience

AlgaeDICE: Policy Gradient from Arbitrary Experience

4 December 2019
Ofir Nachum
Bo Dai
Ilya Kostrikov
Yinlam Chow
Lihong Li
Dale Schuurmans
    OffRL
ArXivPDFHTML

Papers citing "AlgaeDICE: Policy Gradient from Arbitrary Experience"

40 / 40 papers shown
Title
Dual Alignment Maximin Optimization for Offline Model-based RL
Dual Alignment Maximin Optimization for Offline Model-based RL
Chi Zhou
Wang Luo
Haoran Li
Congying Han
Tiande Guo
Zicheng Zhang
OffRL
106
0
0
02 Feb 2025
OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment
OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment
Yooseok Lim
Sujee Lee
OffRL
195
0
0
03 Jan 2025
On-Robot Reinforcement Learning with Goal-Contrastive Rewards
On-Robot Reinforcement Learning with Goal-Contrastive Rewards
Ondrej Biza
Thomas Weng
Lingfeng Sun
Karl Schmeckpeper
Tarik Kelestemur
Yecheng Jason Ma
Robert Platt
Jan-Willem van de Meent
Lawson L. S. Wong
OffRL
65
0
0
25 Oct 2024
Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
Linjiajie Fang
Ruoxue Liu
Jing Zhang
Wenjia Wang
Bing-Yi Jing
OffRL
75
7
0
31 May 2024
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
Bram De Cooman
Johan A. K. Suykens
48
0
0
25 Apr 2024
DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs
DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs
Aayam Shrestha
Stefan Lee
Prasad Tadepalli
Alan Fern
OffRL
80
23
0
18 Oct 2020
GenDICE: Generalized Offline Estimation of Stationary Values
GenDICE: Generalized Offline Estimation of Stationary Values
Ruiyi Zhang
Bo Dai
Lihong Li
Dale Schuurmans
OffRL
121
173
0
21 Feb 2020
Imitation Learning via Off-Policy Distribution Matching
Imitation Learning via Off-Policy Distribution Matching
Ilya Kostrikov
Ofir Nachum
Jonathan Tompson
OOD
OffRL
52
204
0
10 Dec 2019
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
Masatoshi Uehara
Jiawei Huang
Nan Jiang
OffRL
85
186
0
28 Oct 2019
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Ziyang Tang
Yihao Feng
Lihong Li
Dengyong Zhou
Qiang Liu
OffRL
84
68
0
16 Oct 2019
Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real
Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real
Ofir Nachum
Michael Ahn
Hugo Ponte
S. Gu
Vikash Kumar
48
90
0
13 Aug 2019
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary
  Distribution Corrections
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Ofir Nachum
Yinlam Chow
Bo Dai
Lihong Li
OffRL
81
332
0
10 Jun 2019
A Kernel Loss for Solving the Bellman Equation
A Kernel Loss for Solving the Bellman Equation
Yihao Feng
Lihong Li
Qiang Liu
45
70
0
25 May 2019
Off-Policy Policy Gradient with State Distribution Correction
Off-Policy Policy Gradient with State Distribution Correction
Yao Liu
Adith Swaminathan
Alekh Agarwal
Emma Brunskill
OffRL
80
67
0
17 Apr 2019
A Theory of Regularized Markov Decision Processes
A Theory of Regularized Markov Decision Processes
Matthieu Geist
B. Scherrer
Olivier Pietquin
84
317
0
31 Jan 2019
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu
Lihong Li
Ziyang Tang
Dengyong Zhou
OffRL
82
354
0
29 Oct 2018
Neural Approaches to Conversational AI
Neural Approaches to Conversational AI
Jianfeng Gao
Michel Galley
Lihong Li
74
672
0
21 Sep 2018
Learning Dexterous In-Hand Manipulation
Learning Dexterous In-Hand Manipulation
OpenAI OpenAI
Marcin Andrychowicz
Bowen Baker
Maciek Chociej
Rafal Jozefowicz
...
Szymon Sidor
Joshua Tobin
Peter Welinder
Lilian Weng
Wojciech Zaremba
70
1,865
0
01 Aug 2018
Scalable Bilinear $π$ Learning Using State and Action Features
Scalable Bilinear πππ Learning Using State and Action Features
Yichen Chen
Lihong Li
Mengdi Wang
31
46
0
27 Apr 2018
Smoothed Action Value Functions for Learning Gaussian Policies
Smoothed Action Value Functions for Learning Gaussian Policies
Ofir Nachum
Mohammad Norouzi
George Tucker
Dale Schuurmans
68
28
0
06 Mar 2018
Addressing Function Approximation Error in Actor-Critic Methods
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto
H. V. Hoof
David Meger
OffRL
139
5,121
0
26 Feb 2018
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
194
8,236
0
04 Jan 2018
f-Divergence constrained policy improvement
f-Divergence constrained policy improvement
Boris Belousov
Jan Peters
30
19
0
29 Dec 2017
Boosting the Actor with Dual Critic
Boosting the Actor with Dual Critic
Bo Dai
Albert Eaton Shaw
Niao He
Lihong Li
Le Song
50
46
0
29 Dec 2017
Rainbow: Combining Improvements in Deep Reinforcement Learning
Rainbow: Combining Improvements in Deep Reinforcement Learning
Matteo Hessel
Joseph Modayil
H. V. Hasselt
Tom Schaul
Georg Ostrovski
Will Dabney
Dan Horgan
Bilal Piot
M. G. Azar
David Silver
OffRL
91
2,255
0
06 Oct 2017
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
203
18,685
0
20 Jul 2017
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
49
106
0
06 Jul 2017
A unified view of entropy-regularized Markov decision processes
A unified view of entropy-regularized Markov decision processes
Gergely Neu
Anders Jonsson
Vicencc Gómez
82
255
0
22 May 2017
The Reactor: A fast and sample-efficient Actor-Critic agent for
  Reinforcement Learning
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
A. Gruslys
Will Dabney
M. G. Azar
Bilal Piot
Marc G. Bellemare
Rémi Munos
36
58
0
15 Apr 2017
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
104
469
0
28 Feb 2017
Reinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based Policies
Tuomas Haarnoja
Haoran Tang
Pieter Abbeel
Sergey Levine
57
1,329
0
27 Feb 2017
Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement
  Learning
Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning
Yichen Chen
Mengdi Wang
50
64
0
08 Dec 2016
Sample Efficient Actor-Critic with Experience Replay
Sample Efficient Actor-Critic with Experience Replay
Ziyun Wang
V. Bapst
N. Heess
Volodymyr Mnih
Rémi Munos
Koray Kavukcuoglu
Nando de Freitas
79
757
0
03 Nov 2016
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
OffRL
119
611
0
08 Jun 2016
Taming the Noise in Reinforcement Learning via Soft Updates
Taming the Noise in Reinforcement Learning via Soft Updates
Roy Fox
Ari Pakman
Naftali Tishby
33
336
0
28 Dec 2015
Continuous control with deep reinforcement learning
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
176
13,174
0
09 Sep 2015
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih
Koray Kavukcuoglu
David Silver
Alex Graves
Ioannis Antonoglou
Daan Wierstra
Martin Riedmiller
95
12,163
0
19 Dec 2013
Off-policy Learning with Eligibility Traces: A Survey
Off-policy Learning with Eligibility Traces: A Survey
Matthieu Geist
B. Scherrer
OffRL
43
94
0
15 Apr 2013
Off-Policy Actor-Critic
Off-Policy Actor-Critic
T. Degris
Martha White
R. Sutton
OffRL
CML
210
220
0
22 May 2012
Estimating divergence functionals and the likelihood ratio by convex
  risk minimization
Estimating divergence functionals and the likelihood ratio by convex risk minimization
X. Nguyen
Martin J. Wainwright
Michael I. Jordan
149
799
0
04 Sep 2008
1