ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms
v1v2 (latest)

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 626 papers shown
Title
Integrative Decoding: Improve Factuality via Implicit Self-consistency
Integrative Decoding: Improve Factuality via Implicit Self-consistency
Yi Cheng
Xiao Liang
Yeyun Gong
Wen Xiao
Song Wang
...
Wenjie Li
Jian Jiao
Qi Chen
Peng Cheng
Wayne Xiong
HILM
131
3
0
02 Oct 2024
FlashMask: Efficient and Rich Mask Extension of FlashAttention
FlashMask: Efficient and Rich Mask Extension of FlashAttention
Guoxia Wang
Jinle Zeng
Xiyuan Xiao
Siming Wu
Jiabin Yang
Lujing Zheng
Zeyu Chen
Jiang Bian
Dianhai Yu
Haifeng Wang
374
3
0
02 Oct 2024
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Ghada Sokar
J. Obando-Ceron
Rameswar Panda
Hugo Larochelle
Pablo Samuel Castro
MoE
323
7
0
02 Oct 2024
Moral Alignment for LLM Agents
Moral Alignment for LLM Agents
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
122
8
0
02 Oct 2024
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Angela Lopez-Cardona
Carlos Segura
Alexandros Karatzoglou
Sergi Abadal
Ioannis Arapakis
ALM
151
4
0
02 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
186
25
0
01 Oct 2024
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Xingzhou Lou
Dong Yan
Wei Shen
Yuzi Yan
Jian Xie
Junge Zhang
195
28
0
01 Oct 2024
Learning to Swim: Reinforcement Learning for 6-DOF Control of Thruster-driven Autonomous Underwater Vehicles
Learning to Swim: Reinforcement Learning for 6-DOF Control of Thruster-driven Autonomous Underwater Vehicles
Levi Cai
Kevin Chang
Yogesh A. Girdhar
138
3
0
30 Sep 2024
PersonalLLM: Tailoring LLMs to Individual Preferences
PersonalLLM: Tailoring LLMs to Individual Preferences
Thomas P. Zollo
Andrew Siah
Naimeng Ye
Ang Li
Hongseok Namkoong
100
13
0
30 Sep 2024
Enabling Multi-Robot Collaboration from Single-Human Guidance
Enabling Multi-Robot Collaboration from Single-Human Guidance
Zhengran Ji
Lingyu Zhang
Paul Sajda
Boyuan Chen
66
2
0
30 Sep 2024
CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models
CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models
Kanghyun Ryu
Qiayuan Liao
Zhongyu Li
Koushil Sreenath
Negar Mehr
Negar Mehr
LM&Ro
349
4
0
27 Sep 2024
Learning Occlusion-aware Decision-making from Agent Interaction via Active Perception
Learning Occlusion-aware Decision-making from Agent Interaction via Active Perception
Jie Jia
Yiming Shu
Zhongxue Gan
Wenchao Ding
84
2
0
26 Sep 2024
Offline and Distributional Reinforcement Learning for Radio Resource Management
Offline and Distributional Reinforcement Learning for Radio Resource Management
Eslam Eldeeb
Hirley Alves
OffRL
87
2
0
25 Sep 2024
Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning
Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning
Xingyu Wang
Jin Zhou
Yuanli Feng
Jiahao Mei
Jiming Chen
Shuo Li
90
1
0
25 Sep 2024
A Learning Framework for Diverse Legged Robot Locomotion Using Barrier-Based Style Rewards
A Learning Framework for Diverse Legged Robot Locomotion Using Barrier-Based Style Rewards
Gijeong Kim
Yong-Hoon Lee
Hae-Won Park
151
6
0
24 Sep 2024
Whole-body End-Effector Pose Tracking
Whole-body End-Effector Pose Tracking
Tifanny Portela
Andrei Cramariuc
Mayank Mittal
Marco Hutter
100
4
0
24 Sep 2024
NavRL: Learning Safe Flight in Dynamic Environments
NavRL: Learning Safe Flight in Dynamic Environments
Zhefan Xu
Xinming Han
Haoyu Shen
Hanyu Jin
Kenji Shimada
113
7
0
24 Sep 2024
Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning
Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning
Min Tan
Yushun Tao
Boyun Zheng
GaoSheng Xie
Lijuan Feng
Zeyang Xia
Jing Xiong
92
0
0
24 Sep 2024
RRM: Robust Reward Model Training Mitigates Reward Hacking
RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu
Wei Xiong
Jie Jessie Ren
Lichang Chen
Junru Wu
...
Yuan Liu
Bilal Piot
Abe Ittycheriah
Aviral Kumar
Mohammad Saleh
AAML
91
23
0
20 Sep 2024
Human-Robot Cooperative Distribution Coupling for Hamiltonian-Constrained Social Navigation
Human-Robot Cooperative Distribution Coupling for Hamiltonian-Constrained Social Navigation
Weizheng Wang
Chao Yu
Yu Wang
Byung-Cheol Min
410
2
0
20 Sep 2024
ProxFly: Robust Control for Close Proximity Quadcopter Flight via Residual Reinforcement Learning
ProxFly: Robust Control for Close Proximity Quadcopter Flight via Residual Reinforcement Learning
Ruiqi Zhang
Dingqi Zhang
Mark W. Mueller
356
1
0
20 Sep 2024
Disentangling Recognition and Decision Regrets in Image-Based Reinforcement Learning
Disentangling Recognition and Decision Regrets in Image-Based Reinforcement Learning
Alihan Hüyük
A. R. Koblitz
Atefeh Mohajeri
M. Andrews
OffRL
118
0
0
19 Sep 2024
From Lists to Emojis: How Format Bias Affects Model Alignment
From Lists to Emojis: How Format Bias Affects Model Alignment
Xuanchang Zhang
Wei Xiong
Lichang Chen
Dinesh Manocha
Heng Huang
Tong Zhang
ALM
102
13
0
18 Sep 2024
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Maojia Song
Shang Hong Sim
Rishabh Bhardwaj
Hai Leong Chieu
Navonil Majumder
Soujanya Poria
109
12
0
17 Sep 2024
Integrating Reinforcement Learning and Model Predictive Control with Applications to Microgrids
Integrating Reinforcement Learning and Model Predictive Control with Applications to Microgrids
Caio Fabio Oliveira da Silva
Azita Dabiri
B. de Schutter
95
4
0
17 Sep 2024
Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement Learning
Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement Learning
Daniel Flögel
Marcos Gómez Villafane
Joshua Ransiek
Sören Hohmann
166
0
0
16 Sep 2024
Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling
Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling
Jesse van Remmerden
Zaharah Bukhsh
Yingqian Zhang
OffRLOnRL
116
1
0
16 Sep 2024
Flash STU: Fast Spectral Transform Units
Flash STU: Fast Spectral Transform Units
Y. Isabel Liu
Windsor Nguyen
Yagiz Devre
Evan Dogariu
Anirudha Majumdar
Elad Hazan
AI4TS
128
1
0
16 Sep 2024
PIP-Loco: A Proprioceptive Infinite Horizon Planning Framework for Quadrupedal Robot Locomotion
PIP-Loco: A Proprioceptive Infinite Horizon Planning Framework for Quadrupedal Robot Locomotion
Aditya Shirwatkar
Naman Saxena
Kishore Chandra
Shishir Kolathaya
111
4
0
14 Sep 2024
AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models
AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models
Yifei Yao
Wentao He
Chenyu Gu
Jiaheng Du
Fuwei Tan
Zhen Zhu
Junguo Lu
OffRL
110
2
0
13 Sep 2024
One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion
One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion
Nico Bohlinger
Grzegorz Czechmanowski
Maciej Krupka
Piotr Kicki
Krzysztof Walas
Jan Peters
Davide Tateo
96
21
0
10 Sep 2024
CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement Learning
CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement Learning
John Birkbeck
Adam Sobey
Federico Cerutti
Katherine Heseltine Hurley Flynn
Timothy J. Norman
74
0
0
05 Sep 2024
HUMOS: Human Motion Model Conditioned on Body Shape
HUMOS: Human Motion Model Conditioned on Body Shape
Shashank Tripathi
Omid Taheri
Christoph Lassner
Michael J. Black
Daniel Holden
Carsten Stoll
3DHDiffM
143
8
0
05 Sep 2024
Compatible Gradient Approximations for Actor-Critic Algorithms
Compatible Gradient Approximations for Actor-Critic Algorithms
Baturay Saglam
Dionysis Kalogerias
114
0
0
02 Sep 2024
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
Zihao Sheng
Zilin Huang
Sikai Chen
82
10
0
30 Aug 2024
Learning Multi-agent Multi-machine Tending by Mobile Robots
Learning Multi-agent Multi-machine Tending by Mobile Robots
Abdalwhab Abdalwhab
Giovanni Beltrame
Samira Ebrahimi Kahou
David St-Onge
138
1
0
29 Aug 2024
Efficient Multi-agent Navigation with Lightweight DRL Policy
Efficient Multi-agent Navigation with Lightweight DRL Policy
Xingrong Diao
Jiankun Wang
105
0
0
29 Aug 2024
Remove Symmetries to Control Model Expressivity and Improve Optimization
Remove Symmetries to Control Model Expressivity and Improve Optimization
Liu Ziyin
Yizhou Xu
Isaac Chuang
AAML
98
4
0
28 Aug 2024
RAIN: Reinforcement Algorithms for Improving Numerical Weather and Climate Models
RAIN: Reinforcement Algorithms for Improving Numerical Weather and Climate Models
Pritthijit Nath
Henry Moss
Emily Shuckburgh
Mark Webb
AI4ClAI4CE
137
0
0
28 Aug 2024
What makes math problems hard for reinforcement learning: a case study
What makes math problems hard for reinforcement learning: a case study
Ali Shehper
A. Medina-Mardones
Lucas Fagan
Angus Gruen
Piotr Kucharski
Sergei Gukov
Piotr Kucharski
Zhenghan Wang
Sergei Gukov
67
3
0
27 Aug 2024
Diffusion Models Are Real-Time Game Engines
Diffusion Models Are Real-Time Game Engines
Dani Valevski
Yaniv Leviathan
Moab Arar
Shlomi Fruchter
DiffMVGenAI4CE
119
91
0
27 Aug 2024
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang
Philip Torr
Mohamed Elhoseiny
Adel Bibi
184
15
0
27 Aug 2024
LSR-IGRU: Stock Trend Prediction Based on Long Short-Term Relationships and Improved GRU
LSR-IGRU: Stock Trend Prediction Based on Long Short-Term Relationships and Improved GRU
Peng Zhu
Yuante Li
Yifan Hu
Qinyuan Liu
Dawei Cheng
Yuqi Liang
AIFinAI4TS
135
5
0
26 Aug 2024
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Hui Wei
Shenghua He
Tian Xia
Andy H. Wong
Jingyang Lin
Mei Han
Mei Han
ALMELM
164
32
0
23 Aug 2024
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Murun Yang
...
Chunliang Zhang
Tongran Liu
Quan Du
Di Yang
Jingbo Zhu
VLM
140
6
0
22 Aug 2024
Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction
Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction
Anthony GX-Chen
Kenneth Marino
Rob Fergus
OCL
114
1
0
21 Aug 2024
Personality Alignment of Large Language Models
Personality Alignment of Large Language Models
Minjun Zhu
Linyi Yang
Yue Zhang
Yue Zhang
ALM
117
8
0
21 Aug 2024
The Evolution of Reinforcement Learning in Quantitative Finance: A Survey
The Evolution of Reinforcement Learning in Quantitative Finance: A Survey
Nikolaos Pippas
Cagatay Turkay
Elliot A. Ludvig
AIFin
173
3
0
20 Aug 2024
Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter
Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter
Junhao Chen
Bowen Wang
Zhouqiang Jiang
Yuta Nakashima
80
1
0
20 Aug 2024
Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion
Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion
Achref Jaziri
Etienne Kunzel
Visvanathan Ramesh
CLL
100
0
0
19 Aug 2024
Previous
123...1011121389
Next