ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1506.02438
  4. Cited By
High-Dimensional Continuous Control Using Generalized Advantage
  Estimation
v1v2v3v4v5v6 (latest)

High-Dimensional Continuous Control Using Generalized Advantage Estimation

8 June 2015
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
    OffRL
ArXiv (abs)PDFHTML

Papers citing "High-Dimensional Continuous Control Using Generalized Advantage Estimation"

50 / 77 papers shown
Title
What Can RL Bring to VLA Generalization? An Empirical Study
What Can RL Bring to VLA Generalization? An Empirical Study
Jijia Liu
Feng Gao
Bingwen Wei
Xinlei Chen
Qingmin Liao
Yi Wu
Chao Yu
Yu Wang
OffRL
243
0
0
26 May 2025
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
Guanxing Lu
Wenkai Guo
Chubin Zhang
Yuheng Zhou
Haonan Jiang
Zifeng Gao
Yansong Tang
Ziwei Wang
OffRL
100
0
0
24 May 2025
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Fanqi Wan
Weizhou Shen
Shengyi Liao
Yingcheng Shi
Chenliang Li
Ziyi Yang
Ji Zhang
Fei Huang
Jingren Zhou
Ming Yan
OffRLLLMAGReLMLRM
92
0
0
23 May 2025
Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration
Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration
Jingtong Gao
Ling Pan
Yejing Wang
Rui Zhong
Chi Lu
Qingpeng Cai
Peng Jiang
Xiangyu Zhao
LRM
69
1
0
23 May 2025
Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective
Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective
Jintian Shao
YiMing Cheng
Hongyi Huang
Beiwen Zhang
ZhiYu Wu
You Shan
Mingkai Zheng
LRM
71
0
0
23 May 2025
PPO-BR: Dual-Signal Entropy-Reward Adaptation for Trust Region Policy Optimization
PPO-BR: Dual-Signal Entropy-Reward Adaptation for Trust Region Policy Optimization
Ben Rahman
63
0
0
23 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRLLRM
172
8
0
30 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
Xuzhao Li
Kwan-Yee K. Wong
LLMAGReLMLRM
143
5
0
27 Apr 2025
Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers
Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers
Shuang Wang
Haoyang Zhang
Tianxing Wu
Yize Zhang
W. Zhang
Quan Z. Sheng
60
0
0
27 Apr 2025
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Zihan Wang
Kaidi Wang
Q. Wang
Pingyue Zhang
Linjie Li
...
Jiajun Wu
L. Fei-Fei
Lijuan Wang
Yejin Choi
Manling Li
205
30
0
24 Apr 2025
MARFT: Multi-Agent Reinforcement Fine-Tuning
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
OffRL
117
5
0
21 Apr 2025
RAISE: Reinforced Adaptive Instruction Selection For Large Language Models
RAISE: Reinforced Adaptive Instruction Selection For Large Language Models
Lv Qingsong
Yangning Li
Zihua Lan
Zishan Xu
Jiwei Tang
Hai-Tao Zheng
Wenhao Jiang
Wanshi Xu
Philip S. Yu
159
2
0
09 Apr 2025
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Yu Yue
Yufeng Yuan
Qiying Yu
Xiaochen Zuo
Ruofei Zhu
...
Ru Zhang
Xin Liu
Mingxuan Wang
Yonghui Wu
Lin Yan
OffRLLRM
117
38
0
07 Apr 2025
Concise Reasoning via Reinforcement Learning
Concise Reasoning via Reinforcement Learning
Mehdi Fatemi
Banafsheh Rafiee
Mingjie Tang
Kartik Talamadupula
ReLMOffRLLRM
132
17
0
07 Apr 2025
Reinforcement Learning-based Token Pruning in Vision Transformers: A Markov Game Approach
Reinforcement Learning-based Token Pruning in Vision Transformers: A Markov Game Approach
Chenglong Lu
Shen Liang
Xiang Wang
Wei Wang
ViTOffRL
126
0
0
30 Mar 2025
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
Chen Li
Nazhou Liu
Kai Yang
110
10
0
20 Mar 2025
Quantization-Free Autoregressive Action Transformer
Quantization-Free Autoregressive Action Transformer
Ziyad Sheebaelhamd
Michael Tschannen
Michael Muehlebach
Claire Vernade
90
1
0
18 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRLLRM
200
213
0
18 Mar 2025
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin
Hansi Zeng
Zhenrui Yue
Dong Wang
Sercan O. Arik
Dong Wang
Hamed Zamani
Jiawei Han
RALMReLMKELMOffRLAI4TSLRM
200
120
0
12 Mar 2025
Reasoning in visual navigation of end-to-end trained agents: a dynamical systems approach
Reasoning in visual navigation of end-to-end trained agents: a dynamical systems approach
Steeven Janny
Hervé Poirier
L. Antsfeld
G. Bono
G. Monaci
Boris Chidlovskii
Francesco Giuliari
Alessio Del Bue
Christian Wolf
LM&Ro
167
0
0
11 Mar 2025
Multi-Fidelity Policy Gradient Algorithms
Multi-Fidelity Policy Gradient Algorithms
Xinjie Liu
Cyrus Neary
Kushagra Gupta
Christian Ellis
Ufuk Topcu
David Fridovich-Keil
OffRL
464
0
0
07 Mar 2025
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang
Zhihao Wu
Jianheng Liu
Jianye Hao
Jun Wang
Kun Shao
OffRL
104
29
0
24 Feb 2025
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Yuhao Du
Zehan Li
Pengyu Cheng
Zhihong Chen
Yuejiao Xie
Xiang Wan
Anningzhe Gao
108
1
0
20 Feb 2025
Task Aware Dreamer for Task Generalization in Reinforcement Learning
Task Aware Dreamer for Task Generalization in Reinforcement Learning
Chengyang Ying
Zhongkai Hao
Xinning Zhou
Hang Su
Songming Liu
Dong Yan
Jun Zhu
217
3
0
17 Feb 2025
C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
Guoxin Chen
Minpeng Liao
Peiying Yu
Dingmin Wang
Zile Qiao
Chao Yang
Xin Zhao
Kai Fan
93
1
0
10 Feb 2025
Toward Task Generalization via Memory Augmentation in Meta-Reinforcement Learning
Toward Task Generalization via Memory Augmentation in Meta-Reinforcement Learning
Kaixi Bao
Chenhao Li
Yarden As
Andreas Krause
Marco Hutter
OffRLCLL
221
1
0
03 Feb 2025
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Zehan Qi
Xiao-Chang Liu
Iat Long Iong
Hanyu Lai
Xingwu Sun
...
Shuntian Yao
Tianjie Zhang
Wei Xu
J. Tang
Yuxiao Dong
172
40
0
28 Jan 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
Wentao Zhang
Kai Chen
Dahua Lin
Jiaqi Wang
VLM
191
25
0
21 Jan 2025
Blockchain-assisted Demonstration Cloning for Multi-Agent Deep Reinforcement Learning
Blockchain-assisted Demonstration Cloning for Multi-Agent Deep Reinforcement Learning
Ahmed Alagha
Jamal Bentahar
Hadi Otrok
Shakti Singh
R. Mizouni
101
3
0
19 Jan 2025
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation
Zixuan Chen
Jing Huo
Yangtao Chen
Yang Gao
136
4
0
11 Jan 2025
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Eliot Xing
Vernon Luk
Jean Oh
149
1
0
16 Dec 2024
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward
Zhiwei Jia
Yuesong Nan
Huixi Zhao
Gengdai Liu
EGVM
159
1
0
22 Nov 2024
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Miguel Moura Ramos
Tomás Almeida
Daniel Vareta
Filipe Azevedo
Sweta Agrawal
Patrick Fernandes
André F. T. Martins
104
4
0
08 Nov 2024
Embedding Safety into RL: A New Take on Trust Region Methods
Embedding Safety into RL: A New Take on Trust Region Methods
Nikola Milosevic
Johannes Müller
Nico Scherf
92
2
0
05 Nov 2024
L3Ms -- Lagrange Large Language Models
L3Ms -- Lagrange Large Language Models
Guneet S. Dhillon
Xingjian Shi
Yee Whye Teh
Alex Smola
441
0
0
28 Oct 2024
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng
Hengquan Guo
Jiawei Zhang
Dongqing Zou
Ziyu Shao
Honghao Wei
Xin Liu
102
3
0
25 Oct 2024
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Guanlin Liu
Kaixuan Ji
Ning Dai
Zheng Wu
Chen Dun
Q. Gu
Lin Yan
Quanquan Gu
Lin Yan
OffRLLRM
117
13
0
11 Oct 2024
Cooperative and Asynchronous Transformer-based Mission Planning for Heterogeneous Teams of Mobile Robots
Cooperative and Asynchronous Transformer-based Mission Planning for Heterogeneous Teams of Mobile Robots
Milad Farjadnasab
Shahin Sirouspour
94
0
0
08 Oct 2024
Improving Unsupervised Constituency Parsing via Maximizing Semantic Information
Improving Unsupervised Constituency Parsing via Maximizing Semantic Information
Junjie Chen
Xiangheng He
Yusuke Miyao
Danushka Bollegala
84
0
0
03 Oct 2024
Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning
Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning
Xingyu Wang
Jin Zhou
Yuanli Feng
Jiahao Mei
Jiming Chen
Shuo Li
81
1
0
25 Sep 2024
Human-Robot Cooperative Distribution Coupling for Hamiltonian-Constrained Social Navigation
Human-Robot Cooperative Distribution Coupling for Hamiltonian-Constrained Social Navigation
Weizheng Wang
Chao Yu
Yu Wang
Byung-Cheol Min
395
2
0
20 Sep 2024
HUMOS: Human Motion Model Conditioned on Body Shape
HUMOS: Human Motion Model Conditioned on Body Shape
Shashank Tripathi
Omid Taheri
Christoph Lassner
Michael J. Black
Daniel Holden
Carsten Stoll
3DHDiffM
120
8
0
05 Sep 2024
Efficient Multi-agent Navigation with Lightweight DRL Policy
Efficient Multi-agent Navigation with Lightweight DRL Policy
Xingrong Diao
Jiankun Wang
97
0
0
29 Aug 2024
What makes math problems hard for reinforcement learning: a case study
What makes math problems hard for reinforcement learning: a case study
Ali Shehper
A. Medina-Mardones
Lucas Fagan
Angus Gruen
Piotr Kucharski
Sergei Gukov
Piotr Kucharski
Zhenghan Wang
Sergei Gukov
59
3
0
27 Aug 2024
A Survey on Self-play Methods in Reinforcement Learning
A Survey on Self-play Methods in Reinforcement Learning
Chao Yu
Zelai Xu
Chengdong Ma
Chao Yu
Weijuan Tu
...
Deheng Ye
Wenbo Ding
Yaodong Yang
Yu Wang
Yu Wang
SyDaSSLOnRL
100
9
0
02 Aug 2024
SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning
SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning
Jianpeng Yao
Xiaopan Zhang
Yu Xia
Zejin Wang
Amit K. Roy-Chowdhury
Jiachen Li
221
2
0
24 Jul 2024
RoboMorph: Evolving Robot Morphology using Large Language Models
RoboMorph: Evolving Robot Morphology using Large Language Models
Kevin Qiu
Krzysztof Ciebiera
Krzysztof Ciebiera
Marek Cygan
Marek Cygan
Łukasz Kuciński
LM&Ro
124
1
0
11 Jul 2024
Gradient Boosting Reinforcement Learning
Gradient Boosting Reinforcement Learning
Benjamin Fuhrer
Chen Tessler
Gal Dalal
OffRLAI4CE
153
3
0
11 Jul 2024
Can Learned Optimization Make Reinforcement Learning Less Difficult?
Can Learned Optimization Make Reinforcement Learning Less Difficult?
Alexander David Goldie
Chris Xiaoxuan Lu
Matthew Jackson
Shimon Whiteson
Jakob N. Foerster
100
4
0
09 Jul 2024
Advantage Alignment Algorithms
Advantage Alignment Algorithms
Juan Agustin Duque
Milad Aghajohari
Tim Cooijmans
Tianyu Zhang
Rameswar Panda
Gauthier Gidel
Aaron Courville
58
2
0
20 Jun 2024
12
Next