ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.06225
  4. Cited By
Reward-estimation variance elimination in sequential decision processes

Reward-estimation variance elimination in sequential decision processes

15 November 2018
S. Pankov
ArXivPDFHTML

Papers citing "Reward-estimation variance elimination in sequential decision processes"

10 / 10 papers shown
Title
Multi-Fidelity Policy Gradient Algorithms
Multi-Fidelity Policy Gradient Algorithms
Xinjie Liu
Cyrus Neary
Kushagra Gupta
Christian Ellis
Ufuk Topcu
David Fridovich-Keil
OffRL
359
0
0
07 Mar 2025
Variance Reduction for Policy Gradient with Action-Dependent Factorized
  Baselines
Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines
Cathy Wu
Aravind Rajeswaran
Yan Duan
Vikash Kumar
Alexandre M. Bayen
Sham Kakade
Igor Mordatch
Pieter Abbeel
OffRL
35
151
0
20 Mar 2018
The Mirage of Action-Dependent Baselines in Reinforcement Learning
The Mirage of Action-Dependent Baselines in Reinforcement Learning
George Tucker
Surya Bhupatiraju
S. Gu
Richard Turner
Zoubin Ghahramani
Sergey Levine
OffRL
41
127
0
27 Feb 2018
Backpropagation through the Void: Optimizing control variates for
  black-box gradient estimation
Backpropagation through the Void: Optimizing control variates for black-box gradient estimation
Will Grathwohl
Dami Choi
Yuhuai Wu
Geoffrey Roeder
David Duvenaud
75
300
0
31 Oct 2017
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
S. Gu
Timothy Lillicrap
Zoubin Ghahramani
Richard Turner
Sergey Levine
OffRL
BDL
58
344
0
07 Nov 2016
Deep Reinforcement Learning for Robotic Manipulation with Asynchronous
  Off-Policy Updates
Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates
S. Gu
E. Holly
Timothy Lillicrap
Sergey Levine
OffRL
SSL
82
1,474
0
03 Oct 2016
Continuous control with deep reinforcement learning
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
143
13,174
0
09 Sep 2015
High-Dimensional Continuous Control Using Generalized Advantage
  Estimation
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
31
3,368
0
08 Jun 2015
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
221
6,722
0
19 Feb 2015
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih
Koray Kavukcuoglu
David Silver
Alex Graves
Ioannis Antonoglou
Daan Wierstra
Martin Riedmiller
63
12,163
0
19 Dec 2013
1