Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.05477
Cited By
Trust Region Policy Optimization
19 February 2015
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Trust Region Policy Optimization"
50 / 1,109 papers shown
Title
CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic
Huaiyuan Yao
Longchao Da
Vishnu Nandam
Justin Turnau
Zhiwei Liu
Linsey Pang
Hua Wei
LLMAG
65
4
0
10 Jan 2025
Marvel: Accelerating Safe Online Reinforcement Learning with Finetuned Offline Policy
Keru Chen
Honghao Wei
Zhigang Deng
Sen Lin
OffRL
OnRL
94
0
0
31 Dec 2024
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu
Tim Z. Xiao
Weiyang Liu
Yoshua Bengio
Dinghuai Zhang
123
2
0
10 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jun Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
134
7
0
05 Dec 2024
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
A. Jain
Harley Wiltzer
Jesse Farebrother
Irina Rish
Glen Berseth
Sanjiban Choudhury
57
1
0
11 Nov 2024
Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
Zhihong Liu
Xin Xu
Peng Qiao
Dongsheng Li
OffRL
22
2
0
08 Nov 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation
Bowen Li
Zhaoyu Li
Qiwei Du
Jinqi Luo
Wenshan Wang
...
Katia P. Sycara
Pradeep Kumar Ravikumar
Alexander G. Gray
X. Si
Sebastian A. Scherer
AI4CE
LRM
81
3
0
01 Nov 2024
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Rameswar Panda
OffRL
82
5
0
23 Oct 2024
Learning Agents With Prioritization and Parameter Noise in Continuous State and Action Space
Rajesh Mangannavar
Gopalakrishnan Srinivasaraghavan
25
2
0
15 Oct 2024
TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning
Ge Li
Dong Tian
Hongyi Zhou
Xinkai Jiang
Rudolf Lioutikov
Gerhard Neumann
OffRL
199
3
0
12 Oct 2024
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation
Marina Sheshukova
Denis Belomestny
Alain Durmus
Eric Moulines
Alexey Naumov
S. Samsonov
35
1
0
07 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
61
3
0
06 Oct 2024
GreenLight-Gym: Reinforcement learning benchmark environment for control of greenhouse production systems
Bart van Laatum
Eldert J. van Henten
Sjoerd Boersma
OffRL
74
0
0
06 Oct 2024
C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front
Ruohong Liu
Yuxin Pan
Linjie Xu
Lei Song
Jiang Bian
Pengcheng You
Yize Chen
40
1
0
03 Oct 2024
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Yekun Chai
Haoran Sun
Huang Fang
Shuohuan Wang
Yu Sun
Hua Wu
174
1
0
03 Oct 2024
Revisiting inverse Hessian vector products for calculating influence functions
Yegor Klochkov
Yang Liu
TDI
LLMSV
34
1
0
25 Sep 2024
On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration
A. Farid
Jafar Roshanian
Malek Mouhoub
26
1
0
17 Sep 2024
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation
Asen Nachkov
Danda Pani Paudel
Luc Van Gool
34
0
0
12 Sep 2024
QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors with Variance-bounded REINFORCE
Junjie Zhao
Chengxi Zhang
Min Qin
Peng Yang
OOD
31
3
0
08 Sep 2024
Compatible Gradient Approximations for Actor-Critic Algorithms
Baturay Saglam
Dionysis Kalogerias
29
0
0
02 Sep 2024
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
Zihao Sheng
Zilin Huang
Sikai Chen
39
8
0
30 Aug 2024
RAIN: Reinforcement Algorithms for Improving Numerical Weather and Climate Models
Pritthijit Nath
Henry Moss
Emily Shuckburgh
Mark Webb
AI4Cl
AI4CE
37
0
0
28 Aug 2024
The Evolution of Reinforcement Learning in Quantitative Finance: A Survey
Nikolaos Pippas
Cagatay Turkay
Elliot A. Ludvig
AIFin
92
3
0
20 Aug 2024
SAPG: Split and Aggregate Policy Gradients
Jayesh Singla
Ananye Agarwal
Deepak Pathak
OffRL
OnRL
42
3
0
29 Jul 2024
Reinforcement Learning for Sustainable Energy: A Survey
Koen Ponse
Felix Kleuker
Márton Fejér
Álvaro Serra-Gómez
Aske Plaat
Thomas M. Moerland
OffRL
AI4CE
40
1
0
26 Jul 2024
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
30
0
0
23 Jul 2024
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification
Thomas Kwa
Drake Thomas
Adrià Garriga-Alonso
32
1
0
19 Jul 2024
A reinforcement learning strategy to automate and accelerate h/p-multigrid solvers
D. Huergo
Laura Alonso
S. Joshi
Adrian Juanicoteca
Gonzalo Rubio
Esteban Ferrer
AI4CE
28
4
0
18 Jul 2024
Optimistic Q-learning for average reward and episodic reinforcement learning
Priyank Agrawal
Shipra Agrawal
56
3
0
18 Jul 2024
Preference-Guided Reinforcement Learning for Efficient Exploration
Guojian Wang
Faguo Wu
Xiao Zhang
Tianyuan Chen
Xuyang Chen
Lin Zhao
40
0
0
09 Jul 2024
Enhanced Safety in Autonomous Driving: Integrating Latent State Diffusion Model for End-to-End Navigation
Detian Chu
Linyuan Bai
Jianuo Huang
Zhenlong Fang
Peng Zhang
Wei Kang
Haifeng Lin
45
2
0
08 Jul 2024
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Gilles Stoltz
74
0
0
08 Jul 2024
The Impact of Quantization and Pruning on Deep Reinforcement Learning Models
Heng Lu
Mehdi Alemi
Reza Rawassizadeh
39
1
0
05 Jul 2024
Simplifying Deep Temporal Difference Learning
Matteo Gallici
Mattie Fellows
Benjamin Ellis
B. Pou
Ivan Masmitja
Jakob Foerster
Mario Martin
OffRL
62
15
0
05 Jul 2024
Towards shutdownable agents via stochastic choice
Elliott Thornley
Alexander Roman
Christos Ziakas
Leyton Ho
Louis Thomson
40
0
0
30 Jun 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
62
14
0
24 Jun 2024
Advantage Alignment Algorithms
Juan Agustin Duque
Milad Aghajohari
Tim Cooijmans
Tianyu Zhang
Rameswar Panda
Gauthier Gidel
Aaron Courville
28
0
0
20 Jun 2024
Unifying Interpretability and Explainability for Alzheimer's Disease Progression Prediction
Raja Farrukh Ali
Stephanie Milani
John Woods
Emmanuel Adenij
Ayesha Farooq
Clayton Mansel
Jeffrey Burns
William Hsu
33
0
0
11 Jun 2024
CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning
Zeyuan Liu
Kai Yang
Xiu Li
OffRL
44
0
0
11 Jun 2024
An Improved Empirical Fisher Approximation for Natural Gradient Descent
Xiaodong Wu
Wenyi Yu
Chao Zhang
Philip Woodland
29
3
0
10 Jun 2024
GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model
Zhehua Zhou
Xuan Xie
Jiayang Song
Zhan Shu
Lei Ma
47
1
0
06 Jun 2024
Reinforcement learning-based architecture search for quantum machine learning
Frederic Rapp
D. Kreplin
Marco F. Huber
M. Roth
AI4CE
34
5
0
04 Jun 2024
Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
Linjiajie Fang
Ruoxue Liu
Jing Zhang
Wenjia Wang
Bing-Yi Jing
OffRL
56
2
0
31 May 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
46
2
0
30 May 2024
Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees
Dohyeong Kim
Taehyun Cho
Seung Han
Hojun Chung
Kyungjae Lee
Songhwai Oh
34
1
0
29 May 2024
Counterfactual Explanations for Multivariate Time-Series without Training Datasets
Xiangyu Sun
Raquel Aoki
Kevin H. Wilson
27
1
0
28 May 2024
Matrix Low-Rank Trust Region Policy Optimization
Sergio Rozada
Antonio G. Marques
43
0
0
27 May 2024
Mimicry and the Emergence of Cooperative Communication
Dylan R. Cope
Peter McBurney
35
0
0
26 May 2024
Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning
Shangding Gu
Bilgehan Sel
Yuhao Ding
Lu Wang
Qingwei Lin
Alois Knoll
Ming Jin
42
1
0
26 May 2024
Previous
1
2
3
4
5
...
21
22
23
Next