ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.05477
  4. Cited By
Trust Region Policy Optimization
v1v2v3v4v5 (latest)

Trust Region Policy Optimization

19 February 2015
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
ArXiv (abs)PDFHTML

Papers citing "Trust Region Policy Optimization"

50 / 2,008 papers shown
Title
Planning under Uncertainty to Goal Distributions
Planning under Uncertainty to Goal Distributions
Adam Conkey
Tucker Hermans
84
3
0
01 Jul 2025
Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation
Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation
Kosuke Nakanishi
Akihiro Kubo
Yuji Yasui
Shin Ishii
AAMLOffRL
28
0
0
20 Jun 2025
Data-Driven Policy Mapping for Safe RL-based Energy Management Systems
Data-Driven Policy Mapping for Safe RL-based Energy Management Systems
Theo Zangato
A. Osmani
Pegah Alizadeh
20
0
0
19 Jun 2025
BIDA: A Bi-level Interaction Decision-making Algorithm for Autonomous Vehicles in Dynamic Traffic Scenarios
BIDA: A Bi-level Interaction Decision-making Algorithm for Autonomous Vehicles in Dynamic Traffic Scenarios
Liyang Yu
Tianyi Wang
Junfeng Jiao
Fengwu Shan
Hongqing Chu
B. Gao
15
0
0
19 Jun 2025
CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization
CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization
Ranting Hu
OffRL
38
0
0
18 Jun 2025
Quantum Fisher-Preconditioned Reinforcement Learning: From Single-Qubit Control to Rayleigh-Fading Link Adaptation
Quantum Fisher-Preconditioned Reinforcement Learning: From Single-Qubit Control to Rayleigh-Fading Link Adaptation
Oluwaseyi Giwa
Muhammad Ahmed Mohsin
Muhammad Ali Jamshed
OnRL
20
0
0
18 Jun 2025
Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents
Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents
LeCheng Zhang
Yuanshi Wang
Haotian Shen
Xujie Wang
LLMAG
28
0
0
15 Jun 2025
Resolve Highway Conflict in Multi-Autonomous Vehicle Controls with Local State Attention
Resolve Highway Conflict in Multi-Autonomous Vehicle Controls with Local State Attention
Xuan Duy Ta
Bang Giang Le
Thanh Ha Le
Viet-Cuong Ta
20
0
0
13 Jun 2025
DoublyAware: Dual Planning and Policy Awareness for Temporal Difference Learning in Humanoid Locomotion
DoublyAware: Dual Planning and Policy Awareness for Temporal Difference Learning in Humanoid Locomotion
Khang Nguyen
An T. Le
Jan Peters
Minh Nhat Vu
25
0
0
12 Jun 2025
Provable Sim-to-Real Transfer via Offline Domain Randomization
Provable Sim-to-Real Transfer via Offline Domain Randomization
Arnaud Fickinger
Abderrahim Bendahi
Stuart J. Russell
OffRL
46
0
0
11 Jun 2025
On a few pitfalls in KL divergence gradient estimation for RL
Yunhao Tang
Rémi Munos
64
0
0
11 Jun 2025
TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning
TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning
Songze Li
Mingxuan Zhang
Kang Wei
Shouling Ji
AAML
92
0
0
11 Jun 2025
Time-Aware World Model for Adaptive Prediction and Control
Anh N. Nhu
Sanghyun Son
Ming-Chyuan Lin
AI4TSTTA
38
0
0
10 Jun 2025
Intention-Conditioned Flow Occupancy Models
Chongyi Zheng
S. Park
Sergey Levine
Benjamin Eysenbach
AI4TSOffRLAI4CE
48
0
0
10 Jun 2025
Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms
Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms
Haoran Peng
Ying-Jun Angela Zhang
20
0
0
10 Jun 2025
ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition
ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition
Daolang Huang
Xinyi Wen
Ayush Bharti
Samuel Kaski
Luigi Acerbi
28
0
0
08 Jun 2025
Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning
Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning
Eshwar S. R.
Gugan Thoppe
Aditya Gopalan
Gal Dalal
20
0
0
08 Jun 2025
Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning
Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning
Motoki Omura
Kazuki Ota
Takayuki Osa
Yusuke Mukuta
Tatsuya Harada
OffRL
48
0
0
06 Jun 2025
Reusing Trajectories in Policy Gradients Enables Fast Convergence
Reusing Trajectories in Policy Gradients Enables Fast Convergence
Alessandro Montenegro
Federico Mansutti
Marco Mussi
Matteo Papini
Alberto Maria Metelli
OnRL
86
0
0
06 Jun 2025
When Maximum Entropy Misleads Policy Optimization
When Maximum Entropy Misleads Policy Optimization
Ruipeng Zhang
Ya-Chien Chang
Sicun Gao
48
0
0
05 Jun 2025
Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving
Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving
Li Zeqiao
Wang Yijing
Wang Haoyu
Li Zheng
Li Peng
Zuo zhiqiang
Hu Chuan
118
0
0
04 Jun 2025
The Actor-Critic Update Order Matters for PPO in Federated Reinforcement Learning
The Actor-Critic Update Order Matters for PPO in Federated Reinforcement Learning
Zhijie Xie
Shenghui Song
55
0
0
02 Jun 2025
Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning
Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning
Yizhuo Zhang
Heng Wang
Shangbin Feng
Zhaoxuan Tan
Xinyun Liu
Yulia Tsvetkov
OffRL
82
0
0
01 Jun 2025
Reinforcement Learning with Random Time Horizons
Reinforcement Learning with Random Time Horizons
Enric Ribera Borrell
Lorenz Richter
Christof Schütte
AI4TS
37
0
0
01 Jun 2025
CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries
CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries
Ni Mu
Hao Hu
Xiao Hu
Yiqin Yang
Bo Xu
Qing-Shan Jia
59
0
0
31 May 2025
Causal-aware Large Language Models: Enhancing Decision-Making Through Learning, Adapting and Acting
Causal-aware Large Language Models: Enhancing Decision-Making Through Learning, Adapting and Acting
Wei Chen
Jiahao Zhang
Haipeng Zhu
Boyan Xu
Zijian Li
Keli Zhang
Junjian Ye
Ruichu Cai
43
1
0
30 May 2025
Contraction Actor-Critic: Contraction Metric-Guided Reinforcement Learning for Robust Path Tracking
Contraction Actor-Critic: Contraction Metric-Guided Reinforcement Learning for Robust Path Tracking
Minjae Cho
Hiroyasu Tsukamoto
Huy Trong Tran
24
0
0
28 May 2025
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Ganqu Cui
Yuchen Zhang
Jiacheng Chen
Lifan Yuan
Zhi Wang
...
Lei Bai
Wanli Ouyang
Yu Cheng
Bowen Zhou
Ning Ding
LRM
90
5
0
28 May 2025
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley
Mingyu Chen
Zhaolin Gao
Jason D. Lee
Wen Sun
Wenhao Zhan
Xuezhou Zhang
OffRLLRM
88
1
0
27 May 2025
Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning
Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning
Zhuochen Liu
Rahul Jain
Quan Nguyen
44
0
0
25 May 2025
Improving Value Estimation Critically Enhances Vanilla Policy Gradient
Improving Value Estimation Critically Enhances Vanilla Policy Gradient
Tao Wang
Ruipeng Zhang
Sicun Gao
OffRL
55
0
0
25 May 2025
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
Haoyuan Sun
Jiaqi Wu
Bo Xia
Yifu Luo
Yifei Zhao
Kai Qin
Xufei Lv
Tiantian Zhang
Yongzhe Chang
Xueqian Wang
OffRLLRM
212
0
0
24 May 2025
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Huayu Chen
Kaiwen Zheng
Qinsheng Zhang
Ganqu Cui
Yin Cui
Haotian Ye
Tsung-Yi Lin
Ming-Yu Liu
Jun Zhu
Haoxiang Wang
OffRLLRM
263
3
0
23 May 2025
PPO-BR: Dual-Signal Entropy-Reward Adaptation for Trust Region Policy Optimization
PPO-BR: Dual-Signal Entropy-Reward Adaptation for Trust Region Policy Optimization
Ben Rahman
73
0
0
23 May 2025
Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning
Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning
Korel Gundem
Juncheng Dong
Dennis Zhang
Vahid Tarokh
Zhengling Qi
24
0
0
22 May 2025
A Temporal Difference Method for Stochastic Continuous Dynamics
A Temporal Difference Method for Stochastic Continuous Dynamics
Haruki Settai
Naoya Takeishi
Takehisa Yairi
165
0
0
21 May 2025
Runtime Safety through Adaptive Shielding: From Hidden Parameter Inference to Provable Guarantees
Runtime Safety through Adaptive Shielding: From Hidden Parameter Inference to Provable Guarantees
Minjae Kwon
Tyler Ingebrand
Ufuk Topcu
Lu Feng
26
0
0
20 May 2025
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
Soumya Rani Samineni
Durgesh Kalwar
Karthik Valmeekam
Kaya Stechly
Subbarao Kambhampati
OffRL
111
1
0
19 May 2025
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li
Ming Lin
Tomer Galanti
Zhengzhong Tu
Tianbao Yang
113
1
0
18 May 2025
Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning
Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning
Kalyan Cherukuri
Aarav Lala
Yash Yardi
54
0
0
17 May 2025
Bi-Level Policy Optimization with Nyström Hypergradients
Bi-Level Policy Optimization with Nyström Hypergradients
Arjun Prakash
Naicheng He
Denizalp Goktas
Amy Greenwald
77
0
0
16 May 2025
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee
Lifan Yuan
Dilek Hakkani-Tur
Hao Peng
115
0
0
16 May 2025
Modular Robot Control with Motor Primitives
Modular Robot Control with Motor Primitives
Moses C. Nah
Johannes Lachner
Neville Hogan
102
0
0
15 May 2025
LineFlow: A Framework to Learn Active Control of Production Lines
LineFlow: A Framework to Learn Active Control of Production Lines
Kai Müller
Martin Wenzel
Tobias Windisch
AI4CE
65
0
0
10 May 2025
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Taehyun Cho
Seokhun Ju
Seungyub Han
Dohyeong Kim
Kyungjae Lee
Jungwoo Lee
OffRL
125
0
0
06 May 2025
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
Jake Grigsby
Yuke Zhu
Michael S Ryoo
Juan Carlos Niebles
OffRLVLM
96
1
0
06 May 2025
Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator
Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator
Xuyang Chen
Jingliang Duan
Lin Zhao
97
1
0
02 May 2025
Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story
Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story
Vincenzo De Paola
Riccardo Zamboni
Mirco Mutti
Marcello Restelli
122
0
0
02 May 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
149
0
0
26 Apr 2025
Reinforcement learning framework for the mechanical design of microelectronic components under multiphysics constraints
Reinforcement learning framework for the mechanical design of microelectronic components under multiphysics constraints
S. Nair
Timothy F. Walsh
Greg Pickrell
Fabio Semperlotti
64
0
0
23 Apr 2025
1234...394041
Next