ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.05477
  4. Cited By
Trust Region Policy Optimization

Trust Region Policy Optimization

19 February 2015
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
ArXivPDFHTML

Papers citing "Trust Region Policy Optimization"

50 / 1,216 papers shown
Title
Action Noise in Off-Policy Deep Reinforcement Learning: Impact on
  Exploration and Performance
Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance
Jakob J. Hollenstein
Sayantan Auddy
Matteo Saveriano
Erwan Renaudo
J. Piater
41
17
0
08 Jun 2022
Real2Sim or Sim2Real: Robotics Visual Insertion using Deep Reinforcement
  Learning and Real2Sim Policy Adaptation
Real2Sim or Sim2Real: Robotics Visual Insertion using Deep Reinforcement Learning and Real2Sim Policy Adaptation
Yiwen Chen
Xue-Yong Li
Sheng Guo
Xiang Yao Ng
Marcelo H. Ang Jr
20
4
0
06 Jun 2022
Robust Adversarial Attacks Detection based on Explainable Deep
  Reinforcement Learning For UAV Guidance and Planning
Robust Adversarial Attacks Detection based on Explainable Deep Reinforcement Learning For UAV Guidance and Planning
Tom Hickling
Nabil Aouf
P. Spencer
AAML
19
50
0
06 Jun 2022
Policy Optimization for Markov Games: Unified Framework and Faster
  Convergence
Policy Optimization for Markov Games: Unified Framework and Faster Convergence
Runyu Zhang
Qinghua Liu
Haiquan Wang
Caiming Xiong
Na Li
Yu Bai
29
26
0
06 Jun 2022
Learning Dynamics and Generalization in Reinforcement Learning
Learning Dynamics and Generalization in Reinforcement Learning
Clare Lyle
Mark Rowland
Will Dabney
Marta Z. Kwiatkowska
Y. Gal
OOD
OffRL
30
12
0
05 Jun 2022
Algorithm for Constrained Markov Decision Process with Linear
  Convergence
Algorithm for Constrained Markov Decision Process with Linear Convergence
E. Gladin
Maksim Lavrik-Karmazin
K. Zainullina
Varvara Rudenko
Alexander V. Gasnikov
Martin Takáč
33
6
0
03 Jun 2022
On Reinforcement Learning and Distribution Matching for Fine-Tuning
  Language Models with no Catastrophic Forgetting
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
25
51
0
01 Jun 2022
Control of Two-way Coupled Fluid Systems with Differentiable Solvers
Control of Two-way Coupled Fluid Systems with Differentiable Solvers
B. Ramos
Felix Trost
Nils Thuerey
AI4CE
17
5
0
01 Jun 2022
Learning to Use Chopsticks in Diverse Gripping Styles
Learning to Use Chopsticks in Diverse Gripping Styles
Zeshi Yang
KangKang Yin
Libin Liu
30
29
0
28 May 2022
Reward Uncertainty for Exploration in Preference-based Reinforcement
  Learning
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning
Xinran Liang
Katherine Shu
Kimin Lee
Pieter Abbeel
21
58
0
24 May 2022
Regret-Aware Black-Box Optimization with Natural Gradients,
  Trust-Regions and Entropy Control
Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control
Maximilian Hüttenrauch
Gerhard Neumann
29
1
0
24 May 2022
Multi-Agent Collaborative Inference via DNN Decoupling: Intermediate
  Feature Compression and Edge Learning
Multi-Agent Collaborative Inference via DNN Decoupling: Intermediate Feature Compression and Edge Learning
Zhiwei Hao
Guanyu Xu
Yong Luo
Han Hu
Jianping An
Shiwen Mao
32
22
0
24 May 2022
Penalized Proximal Policy Optimization for Safe Reinforcement Learning
Penalized Proximal Policy Optimization for Safe Reinforcement Learning
Linrui Zhang
Li Shen
Long Yang
Shi-Yong Chen
Bo Yuan
Xueqian Wang
Dacheng Tao
13
62
0
24 May 2022
Efficient Reinforcement Learning from Demonstration Using Local Ensemble
  and Reparameterization with Split and Merge of Expert Policies
Efficient Reinforcement Learning from Demonstration Using Local Ensemble and Reparameterization with Split and Merge of Expert Policies
Yu Wang
Fang Liu
29
0
0
23 May 2022
Memory-efficient Reinforcement Learning with Value-based Knowledge
  Consolidation
Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation
Qingfeng Lan
Yangchen Pan
Jun Luo
A. R. Mahmood
OffRL
36
8
0
22 May 2022
A Review of Safe Reinforcement Learning: Methods, Theory and
  Applications
A Review of Safe Reinforcement Learning: Methods, Theory and Applications
Shangding Gu
Longyu Yang
Yali Du
Guang Chen
Florian Walter
Jun Wang
Alois C. Knoll
OffRL
AI4TS
117
241
0
20 May 2022
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still
  Insufficient according to an Off-Policy Measure
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure
Xing Chen
Dongcui Diao
Hechang Chen
Hengshuai Yao
Haiyin Piao
Zhixiao Sun
Zhiwei Yang
Randy Goebel
Bei Jiang
Yi-Ju Chang
OffRL
41
8
0
20 May 2022
Qualitative Differences Between Evolutionary Strategies and
  Reinforcement Learning Methods for Control of Autonomous Agents
Qualitative Differences Between Evolutionary Strategies and Reinforcement Learning Methods for Control of Autonomous Agents
Nicola Milano
S. Nolfi
20
0
0
16 May 2022
Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning
  Environments
Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments
Ryan Sullivan
J. K. Terry
Benjamin Black
John P. Dickerson
27
8
0
14 May 2022
Diverse Imitation Learning via Self-Organizing Generative Models
Diverse Imitation Learning via Self-Organizing Generative Models
Arash Vahabpour
Tianyi Wang
Qiujing Lu
Omead Brandon Pooladzandi
V. Roychowdhury
SSL
26
1
0
06 May 2022
TTOpt: A Maximum Volume Quantized Tensor Train-based Optimization and
  its Application to Reinforcement Learning
TTOpt: A Maximum Volume Quantized Tensor Train-based Optimization and its Application to Reinforcement Learning
Konstantin Sozykin
Andrei Chertkov
R. Schutski
Anh-Huy Phan
A. Cichocki
Ivan Oseledets
14
35
0
30 Apr 2022
From One Hand to Multiple Hands: Imitation Learning for Dexterous
  Manipulation from Single-Camera Teleoperation
From One Hand to Multiple Hands: Imitation Learning for Dexterous Manipulation from Single-Camera Teleoperation
Yuzhe Qin
Hao Su
Xiaolong Wang
27
99
0
26 Apr 2022
Road Traffic Law Adaptive Decision-making for Self-Driving Vehicles
Road Traffic Law Adaptive Decision-making for Self-Driving Vehicles
Jiaxin Liu
Wenhui Zhou
Hong Wang
Zhong Cao
Wen-Hui Yu
Cheng-Yu Zhao
Ding Zhao
Diange Yang
Jun Li
30
23
0
25 Apr 2022
Learning to Constrain Policy Optimization with Virtual Trust Region
Learning to Constrain Policy Optimization with Virtual Trust Region
Hung Le
Thommen Karimpanal George
Majid Abdolshah
D. Nguyen
Kien Do
Sunil R. Gupta
Svetha Venkatesh
33
3
0
20 Apr 2022
Training and Evaluation of Deep Policies using Reinforcement Learning
  and Generative Models
Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models
Ali Ghadirzadeh
Petra Poklukar
Karol Arndt
Chelsea Finn
Ville Kyrki
Danica Kragic
Mårten Björkman
OffRL
22
1
0
18 Apr 2022
Reinforcement Learning Policy Recommendation for Interbank Network
  Stability
Reinforcement Learning Policy Recommendation for Interbank Network Stability
Alessio Brini
G. Tedeschi
Daniele Tantari
11
2
0
14 Apr 2022
Automatically Learning Fallback Strategies with Model-Free Reinforcement
  Learning in Safety-Critical Driving Scenarios
Automatically Learning Fallback Strategies with Model-Free Reinforcement Learning in Safety-Critical Driving Scenarios
Ugo Lecerf
Christelle Yemdji Tchassi
S. Aubert
Pietro Michiardi
26
0
0
11 Apr 2022
Knowledge Infused Decoding
Knowledge Infused Decoding
Ruibo Liu
Guoqing Zheng
Shashank Gupta
Radhika Gaonkar
Chongyang Gao
Soroush Vosoughi
Milad Shokouhi
Ahmed Hassan Awadallah
KELM
25
14
0
06 Apr 2022
Configuration Path Control
Configuration Path Control
S. Pankov
27
1
0
05 Apr 2022
Learning Generalizable Dexterous Manipulation from Human Grasp
  Affordance
Learning Generalizable Dexterous Manipulation from Human Grasp Affordance
Yueh-hua Wu
Jiashun Wang
Xiaolong Wang
29
55
0
05 Apr 2022
Continuously Discovering Novel Strategies via Reward-Switching Policy
  Optimization
Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Zihan Zhou
Wei Fu
Bingliang Zhang
Yi Wu
25
28
0
04 Apr 2022
Robust Meta-Reinforcement Learning with Curriculum-Based Task Sampling
Robust Meta-Reinforcement Learning with Curriculum-Based Task Sampling
Morio Matsumoto
Hiroya Matsuba
Toshihiro Kujirai
16
2
0
31 Mar 2022
Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional
  Reasoning Approach
Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional Reasoning Approach
Xubo Lyu
Amin Banitalebi-Dehkordi
Mo Chen
Yong Zhang
32
2
0
29 Mar 2022
Aggressive Quadrotor Flight Using Curiosity-Driven Reinforcement
  Learning
Aggressive Quadrotor Flight Using Curiosity-Driven Reinforcement Learning
Q. Sun
Jinbao Fang
Weixing Zheng
Yang Tang
19
27
0
26 Mar 2022
Remember and Forget Experience Replay for Multi-Agent Reinforcement
  Learning
Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning
Pascal Weber
Daniel Wälchli
Mustafa Zeqiri
Petros Koumoutsakos
CLL
OffRL
21
7
0
24 Mar 2022
Asynchronous Reinforcement Learning for Real-Time Control of Physical
  Robots
Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots
Yufeng Yuan
Rupam Mahmood
OffRL
31
19
0
23 Mar 2022
Sample-efficient Iterative Lower Bound Optimization of Deep Reactive
  Policies for Planning in Continuous MDPs
Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs
Siow Meng Low
Akshat Kumar
Scott Sanner
25
3
0
23 Mar 2022
Long Short-Term Memory for Spatial Encoding in Multi-Agent Path Planning
Long Short-Term Memory for Spatial Encoding in Multi-Agent Path Planning
Marc R. Schlichting
S. Notter
W. Fichter
26
2
0
21 Mar 2022
Meta-Reinforcement Learning for the Tuning of PI Controllers: An Offline
  Approach
Meta-Reinforcement Learning for the Tuning of PI Controllers: An Offline Approach
Daniel G. McClement
Nathan P. Lawrence
Johan U. Backstrom
Philip D. Loewen
M. Forbes
R. Bhushan Gopaluni
OffRL
24
22
0
17 Mar 2022
Combining imitation and deep reinforcement learning to accomplish
  human-level performance on a virtual foraging task
Combining imitation and deep reinforcement learning to accomplish human-level performance on a virtual foraging task
Vittorio Giammarino
Matthew F. Dunne
Kylie N. Moore
Michael Hasselmo
Chantal E. Stern
I. Paschalidis
OffRL
39
5
0
11 Mar 2022
Dimensionality Reduction and Prioritized Exploration for Policy Search
Dimensionality Reduction and Prioritized Exploration for Policy Search
Marius Memmel
Puze Liu
Davide Tateo
Jan Peters
20
3
0
09 Mar 2022
A Practical AoI Scheduler in IoT Networks with Relays
A Practical AoI Scheduler in IoT Networks with Relays
Biplav Choudhury
Prasenjit Karmakar
Vijay K. Shah
Jeffrey H. Reed
11
1
0
08 Mar 2022
Distributed Control using Reinforcement Learning with
  Temporal-Logic-Based Reward Shaping
Distributed Control using Reinforcement Learning with Temporal-Logic-Based Reward Shaping
Ningyuan Zhang
Wenliang Liu
C. Belta
25
2
0
08 Mar 2022
The Unsurprising Effectiveness of Pre-Trained Vision Models for Control
The Unsurprising Effectiveness of Pre-Trained Vision Models for Control
Simone Parisi
Aravind Rajeswaran
Senthil Purushwalkam
Abhinav Gupta
LM&Ro
34
187
0
07 Mar 2022
Safe Reinforcement Learning for Legged Locomotion
Safe Reinforcement Learning for Legged Locomotion
Tsung-Yen Yang
Tingnan Zhang
Linda Luu
Sehoon Ha
Jie Tan
Wenhao Yu
26
40
0
05 Mar 2022
Avalanche RL: a Continual Reinforcement Learning Library
Avalanche RL: a Continual Reinforcement Learning Library
Nicolo Lucchesi
Antonio Carta
Vincenzo Lomonaco
Davide Bacciu
39
6
0
28 Feb 2022
Neural-Progressive Hedging: Enforcing Constraints in Reinforcement
  Learning with Stochastic Programming
Neural-Progressive Hedging: Enforcing Constraints in Reinforcement Learning with Stochastic Programming
Supriyo Ghosh
L. Wynter
Shiau Hong Lim
D. Nguyen
34
0
0
27 Feb 2022
Learning to Schedule Heuristics for the Simultaneous Stochastic
  Optimization of Mining Complexes
Learning to Schedule Heuristics for the Simultaneous Stochastic Optimization of Mining Complexes
Yassine Yaakoubi
R. Dimitrakopoulos
33
10
0
25 Feb 2022
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models
Spyridon Mouselinos
Henryk Michalewski
Mateusz Malinowski
21
3
0
24 Feb 2022
Reinforcement Learning from Demonstrations by Novel Interactive Expert
  and Application to Automatic Berthing Control Systems for Unmanned Surface
  Vessel
Reinforcement Learning from Demonstrations by Novel Interactive Expert and Application to Automatic Berthing Control Systems for Unmanned Surface Vessel
Haoran Zhang
Chenkun Yin
Yanxin Zhang
S. Jin
Zhenxuan Li
OffRL
18
3
0
23 Feb 2022
Previous
123...8910...232425
Next