ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.05477
  4. Cited By
Trust Region Policy Optimization
v1v2v3v4v5 (latest)

Trust Region Policy Optimization

19 February 2015
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
ArXiv (abs)PDFHTML

Papers citing "Trust Region Policy Optimization"

50 / 2,008 papers shown
Title
Autonomous Control of Redundant Hydraulic Manipulator Using Reinforcement Learning with Action Feedback
Autonomous Control of Redundant Hydraulic Manipulator Using Reinforcement Learning with Action Feedback
Rohit Dhakate
Christian Brommer
C. Böhm
Stephan Weiss
J. Steinbrener
74
5
0
22 Apr 2025
Learning to Reason under Off-Policy Guidance
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRLLRM
191
17
0
21 Apr 2025
MARFT: Multi-Agent Reinforcement Fine-Tuning
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
OffRL
167
5
0
21 Apr 2025
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
Jiyuan Shi
Xinzhe Liu
Dewei Wang
Ouyang Lu
Sören Schwertfeger
Fuchun Sun
Chenjia Bai
Xiaochen Li
127
2
0
19 Apr 2025
HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation
HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation
Jiakai Tang
Jingsen Zhang
Zihang Tian
Xueyang Feng
Lei Wang
Xu Chen
OffRL
410
0
0
19 Apr 2025
Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots
Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots
Zhe Chen
Yan Xia
Jiayuan Liu
Jijia Liu
Wenhao Tang
...
Hongen Liao
Yu-Ping Wang
Chao Yu
Boyu Zhang
Fei Xing
49
1
0
18 Apr 2025
Measures of Variability for Risk-averse Policy Gradient
Measures of Variability for Risk-averse Policy Gradient
Yudong Luo
Yangchen Pan
Jiaqi Tan
Pascal Poupart
85
0
0
15 Apr 2025
Follow the STARs: Dynamic $ω$-Regular Shielding of Learned Policies
Follow the STARs: Dynamic ωωω-Regular Shielding of Learned Policies
Ashwani Anand
Satya Prakash Nayak
Ritam Raha
Anne-Kathrin Schmuck
50
0
0
11 Apr 2025
An Information-Geometric Approach to Artificial Curiosity
An Information-Geometric Approach to Artificial Curiosity
Alexander Nedergaard
Pablo A. Morales
133
0
0
08 Apr 2025
A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks
A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks
L. Felizardo
Edoardo Fadda
Paolo Brandimarte
E. Del-Moral-Hernandez
Mariá Cristina Vasconcelos Nascimento
OffRL
81
0
0
07 Apr 2025
Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Jincheng Mei
Bo Dai
Alekh Agarwal
Mohammad Ghavamzadeh
Csaba Szepesvári
Dale Schuurmans
143
4
0
02 Apr 2025
Nuclear Microreactor Control with Deep Reinforcement Learning
Nuclear Microreactor Control with Deep Reinforcement Learning
Leo Tunkle
Kamal Abdulraheem
Linyu Lin
M. Radaideh
95
0
0
31 Mar 2025
RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations
RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations
Enrico Marchesini
Benjamin Donnot
Constance Crozier
Ian Dytham
Christian Merz
Lars Schewe
Nico Westerbeck
Cathy Wu
Antoine Marot
P. Donti
OffRL
93
1
0
29 Mar 2025
On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations
On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations
Rajdeep Singh Hundal
Yan Xiao
Xiaochun Cao
Jin Song Dong
Manuel Rigger
139
0
0
28 Mar 2025
Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning
Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning
Yongshuai Liu
Xin Liu
209
1
0
26 Mar 2025
One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF
One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF
Xin Cai
99
1
0
25 Mar 2025
Evolutionary Policy Optimization
Evolutionary Policy Optimization
Jianren Wang
Yifan Su
Abhinav Gupta
Deepak Pathak
86
0
0
24 Mar 2025
Adventurer: Exploration with BiGAN for Deep Reinforcement Learning
Adventurer: Exploration with BiGAN for Deep Reinforcement Learning
Yongshuai Liu
Xin Liu
GAN
195
2
0
24 Mar 2025
Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming
Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming
Minori Narita
Ryo Kuroiwa
J. Christopher Beck
89
0
0
20 Mar 2025
Active management of battery degradation in wireless sensor network using deep reinforcement learning for group battery replacement
Active management of battery degradation in wireless sensor network using deep reinforcement learning for group battery replacement
Jong-Hyun Jeonga
Hongki Jo
Qiang Zhou
Tahsin Afroz Hoque Nishat
Lang Wu
62
1
0
20 Mar 2025
RL4Med-DDPO: Reinforcement Learning for Controlled Guidance Towards Diverse Medical Image Generation using Vision-Language Foundation Models
RL4Med-DDPO: Reinforcement Learning for Controlled Guidance Towards Diverse Medical Image Generation using Vision-Language Foundation Models
Parham Saremi
Amar Kumar
Mohammed Mohammed
Zahra Tehraninasab
Tal Arbel
LM&MAMedIm
93
1
0
20 Mar 2025
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Bo Liu
Yunxiang Li
Yangqiu Song
Hanjing Wang
Linyi Yang
...
Jun Wang
Jun Wang
Weinan Zhang
Shuyue Hu
Ying Wen
LLMAGKELMLRMAI4CE
134
11
0
12 Mar 2025
Safe Explicable Policy Search
Safe Explicable Policy Search
Akkamahadevi Hanni
Jonathan Montaño
Yu Zhang
124
0
0
10 Mar 2025
Probabilistic Shielding for Safe Reinforcement Learning
Probabilistic Shielding for Safe Reinforcement Learning
Edwin Hamel-De le Court
Francesco Belardinelli
Alex W. Goodall
116
0
0
09 Mar 2025
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Hyungkyu Kang
Min-hwan Oh
OffRL
121
0
0
07 Mar 2025
A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning
Shashank Gupta
Chaitanya Ahuja
Tsung-Yu Lin
Sreya Dutta Roy
Harrie Oosterhuis
Maarten de Rijke
Satya Narayan Shukla
119
2
0
02 Mar 2025
Unifying Model Predictive Path Integral Control, Reinforcement Learning, and Diffusion Models for Optimal Control and Planning
Unifying Model Predictive Path Integral Control, Reinforcement Learning, and Diffusion Models for Optimal Control and Planning
Yankai Li
Mo Chen
AI4CE
117
0
0
27 Feb 2025
Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning
Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning
Jaehyeon Son
Soochan Lee
Gunhee Kim
OffRL
135
4
0
26 Feb 2025
AMPO: Active Multi-Preference Optimization for Self-play Preference Selection
AMPO: Active Multi-Preference Optimization for Self-play Preference Selection
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
115
0
0
25 Feb 2025
Safe Multi-Agent Navigation guided by Goal-Conditioned Safe Reinforcement Learning
Safe Multi-Agent Navigation guided by Goal-Conditioned Safe Reinforcement Learning
Meng Feng
Viraj Parimi
B. Williams
134
2
0
25 Feb 2025
Enhancing PPO with Trajectory-Aware Hybrid Policies
Qisai Liu
Zhanhong Jiang
Hsin-Jung Yang
Mahsa Khosravi
Joshua R. Waite
Soumik Sarkar
112
0
0
21 Feb 2025
RobotIQ: Empowering Mobile Robots with Human-Level Planning for Real-World Execution
RobotIQ: Empowering Mobile Robots with Human-Level Planning for Real-World Execution
Emmanuel K. Raptis
Athanasios Ch. Kapoutsis
Elias B. Kosmatopoulos
LM&Ro
125
0
0
18 Feb 2025
Reward-Safety Balance in Offline Safe RL via Diffusion Regularization
Junyu Guo
Zhi Zheng
Donghao Ying
Ming Jin
Shangding Gu
C. Spanos
Javad Lavaei
OffRL
202
0
0
18 Feb 2025
Convergence of Policy Mirror Descent Beyond Compatible Function Approximation
Convergence of Policy Mirror Descent Beyond Compatible Function Approximation
Uri Sherman
Tomer Koren
Yishay Mansour
70
0
0
16 Feb 2025
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
Jincheng Mei
Bo Dai
Alekh Agarwal
Sharan Vaswani
Anant Raj
Csaba Szepesvári
Dale Schuurmans
136
0
0
11 Feb 2025
Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures
Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures
Ashab Uddin
Ahmed Hamdi Sakr
Ning Zhang
OffRL
108
0
0
10 Feb 2025
Mirror Descent Actor Critic via Bounded Advantage Learning
Mirror Descent Actor Critic via Bounded Advantage Learning
Ryo Iwaki
137
0
0
06 Feb 2025
Circular Microalgae-Based Carbon Control for Net Zero
Circular Microalgae-Based Carbon Control for Net Zero
Federico Zocco
Joan García
W. Haddad
198
1
0
04 Feb 2025
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods
Oussama Zekri
Nicolas Boullé
DiffM
155
4
0
03 Feb 2025
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
Hanyang Zhao
Haoxian Chen
Ji Zhang
D. Yao
Wenpin Tang
154
1
0
03 Feb 2025
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning
Haque Ishfaq
Guangyuan Wang
Sami Nur Islam
Doina Precup
130
4
0
29 Jan 2025
Low-altitude Friendly-Jamming for Satellite-Maritime Communications via Generative AI-enabled Deep Reinforcement Learning
Jiawei Huang
Aimin Wang
Geng Sun
Jiahui Li
Jiacheng Wang
Dusit Niyato
Victor C. M. Leung
115
0
0
28 Jan 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
174
16
0
28 Jan 2025
ABPT: Amended Backpropagation through Time with Partially Differentiable Rewards
ABPT: Amended Backpropagation through Time with Partially Differentiable Rewards
Fanxing Li
Fangyu Sun
Tianbao Zhang
Danping Zou
95
0
0
24 Jan 2025
State Combinatorial Generalization In Decision Making With Conditional Diffusion Models
State Combinatorial Generalization In Decision Making With Conditional Diffusion Models
Xintong Duan
Yutong He
Fahim Tajwar
Wen-Tse Chen
Ruslan Salakhutdinov
Jeff Schneider
OffRLAI4CE
153
1
0
22 Jan 2025
An Offline Multi-Agent Reinforcement Learning Framework for Radio Resource Management
An Offline Multi-Agent Reinforcement Learning Framework for Radio Resource Management
Eslam Eldeeb
Hirley Alves
OffRL
125
0
0
22 Jan 2025
HEPPO: Hardware-Efficient Proximal Policy Optimization -- A Universal Pipelined Architecture for Generalized Advantage Estimation
HEPPO: Hardware-Efficient Proximal Policy Optimization -- A Universal Pipelined Architecture for Generalized Advantage Estimation
Hazem Taha
Ameer M. S. Abdelhadi
70
1
0
22 Jan 2025
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Chaoqi Wang
Zhuokai Zhao
Yibo Jiang
Zhaorun Chen
Chen Zhu
...
Jiayi Liu
Lizhu Zhang
Xiangjun Fan
Hao Ma
Sinong Wang
189
5
0
16 Jan 2025
CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic
CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic
Huaiyuan Yao
Longchao Da
Vishnu Nandam
Justin Turnau
Zhiwei Liu
Linsey Pang
Hua Wei
LLMAG
146
6
0
10 Jan 2025
CuRLA: Curriculum Learning Based Deep Reinforcement Learning for Autonomous Driving
Bhargava Uppuluri
Anjel Patel
Neil Mehta
Sridhar Kamath
Pratyush Chakraborty
122
1
0
10 Jan 2025
Previous
12345...394041
Next