ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.05477
  4. Cited By
Trust Region Policy Optimization
v1v2v3v4v5 (latest)

Trust Region Policy Optimization

19 February 2015
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
ArXiv (abs)PDFHTML

Papers citing "Trust Region Policy Optimization"

50 / 2,008 papers shown
Title
Learn A Flexible Exploration Model for Parameterized Action Markov Decision Processes
Zijian Wang
Bin Wang
Mingwen Shao
Hongbo Dou
Boxiang Tao
115
0
0
06 Jan 2025
RFPPO: Motion Dynamic RRT based Fluid Field - PPO for Dynamic TF/TA Routing Planning
RFPPO: Motion Dynamic RRT based Fluid Field - PPO for Dynamic TF/TA Routing Planning
Rongkun Xue
Jing Yang
Yuyang Jiang
Yiming Feng
Zi Yang
101
0
0
31 Dec 2024
Achieving Collective Welfare in Multi-Agent Reinforcement Learning via Suggestion Sharing
Achieving Collective Welfare in Multi-Agent Reinforcement Learning via Suggestion Sharing
Yue Jin
Shuangqing Wei
Giovanni Montana
163
0
0
16 Dec 2024
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu
Tim Z. Xiao
Weiyang Liu
Yoshua Bengio
Dinghuai Zhang
264
6
0
10 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jing Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Leilei Gan
G. Wang
Eduard H. Hovy
OffRL
265
16
0
05 Dec 2024
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
A. Jain
Harley Wiltzer
Jesse Farebrother
Irina Rish
Glen Berseth
Sanjiban Choudhury
143
2
0
11 Nov 2024
Structure Matters: Dynamic Policy Gradient
Structure Matters: Dynamic Policy Gradient
Sara Klein
Xiangyuan Zhang
Tamer Basar
Simon Weissmann
Leif Döring
59
0
0
07 Nov 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
234
6
0
07 Nov 2024
LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation
LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation
Bowen Li
Zhaoyu Li
Qiwei Du
Jinqi Luo
Wenshan Wang
...
Katia Sycara
Pradeep Kumar Ravikumar
Alexander G. Gray
X. Si
Sebastian A. Scherer
AI4CELRM
161
5
0
01 Nov 2024
David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training
David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training
Weijian Luo
C. Zhang
Debing Zhang
Zhengyang Geng
101
4
0
28 Oct 2024
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Rameswar Panda
OffRL
183
11
0
23 Oct 2024
Exploiting Risk-Aversion and Size-dependent fees in FX Trading with
  Fitted Natural Actor-Critic
Exploiting Risk-Aversion and Size-dependent fees in FX Trading with Fitted Natural Actor-Critic
Vito Alessandro Monaco
Antonio Riva
Luca Sabbioni
L. Bisi
Edoardo Vittori
Marco Pinciroli
Michele Trapletti
Marcello Restelli
24
0
0
15 Oct 2024
Learning Agents With Prioritization and Parameter Noise in Continuous
  State and Action Space
Learning Agents With Prioritization and Parameter Noise in Continuous State and Action Space
Rajesh Mangannavar
Gopalakrishnan Srinivasaraghavan
41
2
0
15 Oct 2024
Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning
Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning
Bokai Hu
Sai Ashish Somayajula
Xin Pan
Zihan Huang
OffRL
33
1
0
14 Oct 2024
TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning
TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning
Ge Li
Dong Tian
Hongyi Zhou
Xinkai Jiang
Rudolf Lioutikov
Gerhard Neumann
OffRL
540
4
0
12 Oct 2024
Reinforcement Learning From Imperfect Corrective Actions And Proxy
  Rewards
Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards
Zhaohui Jiang
Xuening Feng
Paul Weng
Yifei Zhu
Yan Song
Tianze Zhou
Yujing Hu
Tangjie Lv
Changjie Fan
135
1
0
08 Oct 2024
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation
Marina Sheshukova
Denis Belomestny
Alain Durmus
Eric Moulines
Alexey Naumov
S. Samsonov
77
1
0
07 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
147
7
0
06 Oct 2024
C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front
C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front
Ruohong Liu
Yuxin Pan
Linjie Xu
Lei Song
Jiang Bian
Pengcheng You
Yize Chen
106
2
0
03 Oct 2024
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Yekun Chai
Haoran Sun
Huang Fang
Shuohuan Wang
Yu Sun
Hua Wu
485
4
0
03 Oct 2024
HybridFlow: A Flexible and Efficient RLHF Framework
HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng
Chi Zhang
Zilingfeng Ye
Xibin Wu
Wang Zhang
Ru Zhang
Size Zheng
Haibin Lin
Chuan Wu
AI4CE
238
240
0
28 Sep 2024
Autoregressive Policy Optimization for Constrained Allocation Tasks
Autoregressive Policy Optimization for Constrained Allocation Tasks
David Winkel
Niklas Strauß
Maximilian Bernhard
Zongyue Li
Thomas Seidl
Matthias Schubert
63
0
0
27 Sep 2024
QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors with Variance-bounded REINFORCE
QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors with Variance-bounded REINFORCE
Junjie Zhao
Chengxi Zhang
Min Qin
Peng Yang
OOD
112
5
0
08 Sep 2024
Compatible Gradient Approximations for Actor-Critic Algorithms
Compatible Gradient Approximations for Actor-Critic Algorithms
Baturay Saglam
Dionysis Kalogerias
139
0
0
02 Sep 2024
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
Zihao Sheng
Zilin Huang
Sikai Chen
96
10
0
30 Aug 2024
RAIN: Reinforcement Algorithms for Improving Numerical Weather and Climate Models
RAIN: Reinforcement Algorithms for Improving Numerical Weather and Climate Models
Pritthijit Nath
Henry Moss
Emily Shuckburgh
Mark Webb
AI4ClAI4CE
171
0
0
28 Aug 2024
Advances in Preference-based Reinforcement Learning: A Review
Advances in Preference-based Reinforcement Learning: A Review
Youssef Abdelkareem
Shady Shehata
Fakhri Karray
OffRL
98
10
0
21 Aug 2024
The Evolution of Reinforcement Learning in Quantitative Finance: A Survey
The Evolution of Reinforcement Learning in Quantitative Finance: A Survey
Nikolaos Pippas
Cagatay Turkay
Elliot A. Ludvig
AIFin
198
4
0
20 Aug 2024
A Comparison of Imitation Learning Algorithms for Bimanual Manipulation
A Comparison of Imitation Learning Algorithms for Bimanual Manipulation
Michael Drolet
Simon Stepputtis
Siva Kailas
Ajinkya Jain
Jan Peters
S. Schaal
H. B. Amor
81
10
0
13 Aug 2024
GFlowNet Training by Policy Gradients
GFlowNet Training by Policy Gradients
Puhua Niu
Shili Wu
Mingzhou Fan
Xiaoning Qian
145
3
0
12 Aug 2024
Functional Acceleration for Policy Mirror Descent
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
110
0
0
23 Jul 2024
Optimistic Q-learning for average reward and episodic reinforcement learning
Optimistic Q-learning for average reward and episodic reinforcement learning
Priyank Agrawal
Shipra Agrawal
125
6
0
18 Jul 2024
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Gilles Stoltz
125
1
0
08 Jul 2024
Simplifying Deep Temporal Difference Learning
Simplifying Deep Temporal Difference Learning
Matteo Gallici
Mattie Fellows
Benjamin Ellis
B. Pou
Ivan Masmitja
Jakob Foerster
Mario Martin
OffRL
180
26
0
05 Jul 2024
Towards shutdownable agents via stochastic choice
Towards shutdownable agents via stochastic choice
Elliott Thornley
Alexander Roman
Christos Ziakas
Leyton Ho
Louis Thomson
147
0
0
30 Jun 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
136
23
0
24 Jun 2024
Advantage Alignment Algorithms
Advantage Alignment Algorithms
Juan Agustin Duque
Milad Aghajohari
Tim Cooijmans
Tianyu Zhang
Rameswar Panda
Gauthier Gidel
Aaron Courville
84
2
0
20 Jun 2024
CDSA: Conservative Denoising Score-based Algorithm for Offline
  Reinforcement Learning
CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning
Zeyuan Liu
Kai Yang
Xiu Li
OffRL
111
0
0
11 Jun 2024
An Improved Empirical Fisher Approximation for Natural Gradient Descent
An Improved Empirical Fisher Approximation for Natural Gradient Descent
Xiaodong Wu
Wenyi Yu
Chao Zhang
Philip Woodland
84
5
0
10 Jun 2024
GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model
GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model
Zhehua Zhou
Xuan Xie
Jiayang Song
Zhan Shu
Lei Ma
125
1
0
06 Jun 2024
Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
Linjiajie Fang
Ruoxue Liu
Jing Zhang
Wenjia Wang
Bing-Yi Jing
OffRL
181
7
0
31 May 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
135
2
0
30 May 2024
No $D_{\text{train}}$: Model-Agnostic Counterfactual Explanations Using Reinforcement Learning
No DtrainD_{\text{train}}Dtrain​: Model-Agnostic Counterfactual Explanations Using Reinforcement Learning
Xiangyu Sun
Raquel Aoki
Kevin H. Wilson
70
1
0
28 May 2024
Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales
Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales
Ju-Seung Byun
Andrew Perrault
57
1
0
27 May 2024
Mimicry and the Emergence of Cooperative Communication
Mimicry and the Emergence of Cooperative Communication
Dylan R. Cope
Peter McBurney
89
0
0
26 May 2024
Bayesian Optimization of Functions over Node Subsets in Graphs
Bayesian Optimization of Functions over Node Subsets in Graphs
Huidong Liang
Xingchen Wan
Xiaowen Dong
114
1
0
24 May 2024
MeMo: Meaningful, Modular Controllers via Noise Injection
MeMo: Meaningful, Modular Controllers via Noise Injection
Megan Tjandrasuwita
Jie Xu
Armando Solar-Lezama
Wojciech Matusik
101
0
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
337
54
0
23 May 2024
Reinforcement learning
Reinforcement learning
Florentin Wörgötter
123
2,528
0
16 May 2024
Fast Stochastic Policy Gradient: Negative Momentum for Reinforcement
  Learning
Fast Stochastic Policy Gradient: Negative Momentum for Reinforcement Learning
Haobin Zhang
Zhuang Yang
68
0
0
08 May 2024
Previous
123456...394041
Next