Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1506.02438
Cited By
v1
v2
v3
v4
v5
v6 (latest)
High-Dimensional Continuous Control Using Generalized Advantage Estimation
8 June 2015
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"High-Dimensional Continuous Control Using Generalized Advantage Estimation"
27 / 77 papers shown
Title
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
161
33
0
17 Jun 2024
GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model
Zhehua Zhou
Xuan Xie
Jiayang Song
Zhan Shu
Lei Ma
91
1
0
06 Jun 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
114
2
0
30 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
290
54
0
23 May 2024
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Ziqiao Ma
Zekun Wang
Joyce Chai
111
4
0
22 May 2024
Self-playing Adversarial Language Game Enhances LLM Reasoning
Pengyu Cheng
Tianhao Hu
Han Xu
Zhisong Zhang
Yong Dai
Lei Han
Nan Du
Nan Du
Xiaolong Li
SyDa
LRM
ReLM
153
38
0
16 Apr 2024
Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis
Guangchen Lan
Dong-Jun Han
Abolfazl Hashemi
Vaneet Aggarwal
Christopher G. Brinton
182
16
0
09 Apr 2024
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Liqiang Jing
Xinya Du
146
17
0
07 Apr 2024
Closure Discovery for Coarse-Grained Partial Differential Equations Using Grid-based Reinforcement Learning
Jan-Philipp von Bassewitz
Sebastian Kaltenbach
Petros Koumoutsakos
AI4CE
99
2
0
01 Feb 2024
An Invitation to Deep Reinforcement Learning
Bernhard Jaeger
Andreas Geiger
OffRL
OOD
131
5
0
13 Dec 2023
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
153
13
0
28 Aug 2023
Online Reinforcement Learning in Non-Stationary Context-Driven Environments
Pouya Hamadanian
Arash Nasr-Esfahany
Malte Schwarzkopf
Siddartha Sen
MohammadIman Alizadeh
CLL
OffRL
110
0
0
04 Feb 2023
Proximal Policy Optimization with Graph Neural Networks for Optimal Power Flow
Ángela López-Cardona
Guillermo Bernárdez
Pere Barlet-Ros
A. Cabellos-Aparicio
182
4
0
23 Dec 2022
Learning Progress Driven Multi-Agent Curriculum
Wenshuai Zhao
Zhiyuan Li
Joni Pajarinen
86
0
0
20 May 2022
Generative Design by Reinforcement Learning: Enhancing the Diversity of Topology Optimization Designs
Seowoo Jang
Soyoung Yoo
Namwoo Kang
AI4CE
66
72
0
17 Aug 2020
COLREG-Compliant Collision Avoidance for Unmanned Surface Vehicle using Deep Reinforcement Learning
Eivind Meyer
Amalie Heiberg
Adil Rasheed
Omer San
70
74
0
16 Jun 2020
Agent Modelling under Partial Observability for Deep Reinforcement Learning
Georgios Papoudakis
Filippos Christianos
Stefano V. Albrecht
67
65
0
16 Jun 2020
Residual Force Control for Agile Human Behavior Imitation and Extended Motion Synthesis
Ye Yuan
Kris Kitani
133
77
0
12 Jun 2020
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning
Nat Dilokthanakul
Christos Kaplanis
Nick Pawlowski
Murray Shanahan
68
92
0
18 May 2017
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning
Joshua Achiam
S. Shankar Sastry
71
238
0
06 Mar 2017
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
169
474
0
28 Feb 2017
Learning Control for Air Hockey Striking using Deep Reinforcement Learning
Ayal Taitler
N. Shimkin
55
10
0
26 Feb 2017
Collaborative Deep Reinforcement Learning
Kaixiang Lin
Shu Wang
Jiayu Zhou
75
22
0
19 Feb 2017
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving
Shai Shalev-Shwartz
Shaked Shammah
Amnon Shashua
111
840
0
11 Oct 2016
Learning Continuous Control Policies by Stochastic Value Gradients
N. Heess
Greg Wayne
David Silver
Timothy Lillicrap
Yuval Tassa
Tom Erez
97
560
0
30 Oct 2015
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
325
13,286
0
09 Sep 2015
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
277
6,796
0
19 Feb 2015
Previous
1
2