Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.06347
Cited By
v1
v2 (latest)
Proximal Policy Optimization Algorithms
20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Proximal Policy Optimization Algorithms"
50 / 626 papers shown
Title
SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning
Jianye Xu
Pan Hu
Bassam Alrifaee
84
5
0
14 Aug 2024
TacSL: A Library for Visuotactile Sensor Simulation and Learning
Iretiayo Akinola
Jie Xu
Jan Carius
Dieter Fox
Yashraj S. Narang
112
10
0
12 Aug 2024
FORGE: Force-Guided Exploration for Robust Contact-Rich Manipulation under Uncertainty
Michael Noseworthy
Bingjie Tang
Bowen Wen
Ankur Handa
Nicholas Roy
Nicholas Roy
Dieter Fox
Yashraj S. Narang
Iretiayo Akinola
Iretiayo Akinola
101
11
0
08 Aug 2024
Model-Based Transfer Learning for Contextual Reinforcement Learning
Jung-Hoon Cho
Vindula Jayawardana
Sirui Li
Cathy Wu
OffRL
154
0
0
08 Aug 2024
Achieving Human Level Competitive Robot Table Tennis
David B. DÁmbrosio
Saminda Abeyruwan
L. Graesser
Atil Iscen
H. B. Amor
...
Vikas Sindhwani
Vincent Vanhoucke
Grace Vesom
P. Xu
Pannag R Sanketi
158
15
0
07 Aug 2024
Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning
Haozhe Ma
Zhengding Luo
Thanh Vinh Vo
Kuankuan Sima
Tze-Yun Leong
109
9
0
06 Aug 2024
Generalized Gaussian Temporal Difference Error for Uncertainty-aware Reinforcement Learning
Seyeon Kim
Joonhun Lee
Namhoon Cho
Sungjun Han
Seungeon Baek
111
0
0
05 Aug 2024
A Survey on Self-play Methods in Reinforcement Learning
Chao Yu
Zelai Xu
Chengdong Ma
Chao Yu
Weijuan Tu
...
Deheng Ye
Wenbo Ding
Yaodong Yang
Yu Wang
Yu Wang
SyDa
SSL
OnRL
139
9
0
02 Aug 2024
Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots
Pranay Dugar
Aayam Shrestha
Fangzhou Yu
Bart Jaap van Marum
Alan Fern
79
11
0
30 Jul 2024
SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning
Jianpeng Yao
Xiaopan Zhang
Yu Xia
Zejin Wang
Amit K. Roy-Chowdhury
Jiachen Li
228
2
0
24 Jul 2024
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
87
0
0
23 Jul 2024
Sustainable broadcasting in Blockchain Networks with Reinforcement Learning
Danila Valko
Daniel Kudenko
90
0
0
22 Jul 2024
ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems
Yi Zhang
Ruihong Qiu
Jiajun Liu
Sen Wang
OffRL
92
0
0
18 Jul 2024
Random Latent Exploration for Deep Reinforcement Learning
Srinath Mahankali
Zhang-Wei Hong
Ayush Sekhari
Alexander Rakhlin
Pulkit Agrawal
234
3
0
18 Jul 2024
PersLLM: A Personified Training Approach for Large Language Models
Zheni Zeng
Jiayi Chen
Haotian Chen
Yukun Yan
Yuxuan Chen
Zhenghao Liu
Zhiyuan Liu
Maosong Sun
LLMAG
120
2
0
17 Jul 2024
InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply Chains
Yinzhu Quan
Zefang Liu
LLMAG
98
4
0
16 Jul 2024
Learning to Steer Markovian Agents under Model Uncertainty
Jiawei Huang
Vinzenz Thoma
Zebang Shen
H. Nax
Niao He
111
2
0
14 Jul 2024
Gradient Boosting Reinforcement Learning
Benjamin Fuhrer
Chen Tessler
Gal Dalal
OffRL
AI4CE
160
3
0
11 Jul 2024
RoboMorph: Evolving Robot Morphology using Large Language Models
Kevin Qiu
Krzysztof Ciebiera
Krzysztof Ciebiera
Marek Cygan
Marek Cygan
Łukasz Kuciński
LM&Ro
128
1
0
11 Jul 2024
Learning In-Hand Translation Using Tactile Skin With Shear and Normal Force Sensing
Jessica Yin
Haozhi Qi
Jitendra Malik
James Pikul
Mark H. Yim
Tess Hellebrekers
95
7
0
10 Jul 2024
Can Learned Optimization Make Reinforcement Learning Less Difficult?
Alexander David Goldie
Chris Xiaoxuan Lu
Matthew Jackson
Shimon Whiteson
Jakob N. Foerster
122
5
0
09 Jul 2024
Variational Best-of-N Alignment
Afra Amini
Tim Vieira
Ryan Cotterell
Ryan Cotterell
BDL
85
23
0
08 Jul 2024
Simplifying Deep Temporal Difference Learning
Matteo Gallici
Mattie Fellows
Benjamin Ellis
B. Pou
Ivan Masmitja
Jakob Foerster
Mario Martin
OffRL
146
26
0
05 Jul 2024
HAF-RM: A Hybrid Alignment Framework for Reward Model Training
Shujun Liu
Xiaoyu Shen
Yuhang Lai
Siyuan Wang
Shengbin Yue
Zengfeng Huang
Xuanjing Huang
Zhongyu Wei
96
1
0
04 Jul 2024
A Role of Environmental Complexity on Representation Learning in Deep Reinforcement Learning Agents
Andrew Liu
Alla Borisyuk
130
1
0
03 Jul 2024
AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment
Yilong Lai
Jialong Wu
Congzhi Zhang
Haowen Sun
Deyu Zhou
114
4
0
02 Jul 2024
Hierarchical Decoupling Capacitor Optimization for Power Distribution Network of 2.5D ICs with Co-Analysis of Frequency and Time Domains Based on Deep Reinforcement Learning
Yuanyuan Duan
Haiyang Feng
Zhiping Yu
Hanming Wu
Leilai Shao
Xiaolei Zhu
26
0
0
02 Jul 2024
Generation of Geodesics with Actor-Critic Reinforcement Learning to Predict Midpoints
Kazumi Kasaura
121
0
0
02 Jul 2024
Towards shutdownable agents via stochastic choice
Elliott Thornley
Alexander Roman
Christos Ziakas
Leyton Ho
Louis Thomson
116
0
0
30 Jun 2024
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Yuheng Zhang
Dian Yu
Baolin Peng
Linfeng Song
Ye Tian
Mingyue Huo
Nan Jiang
Haitao Mi
Dong Yu
208
18
0
30 Jun 2024
Safety through feedback in Constrained RL
Shashank Reddy Chirra
Pradeep Varakantham
P. Paruchuri
OffRL
102
1
0
28 Jun 2024
LCSim: A Large-Scale Controllable Traffic Simulator
Yuheng Zhang
Tianjian Ouyang
Fudan Yu
Cong Ma
Lei Qiao
Jingtao Ding
Jian Yuan
Yong Li
109
2
0
28 Jun 2024
Text2Robot: Evolutionary Robot Design from Text Descriptions
Ryan P. Ringel
Zachary S. Charlick
Jiaxun Liu
Boxi Xia
Boyuan Chen
140
2
0
28 Jun 2024
Preference Elicitation for Offline Reinforcement Learning
Alizée Pace
Bernhard Schölkopf
Gunnar Rätsch
Giorgia Ramponi
OffRL
118
1
0
26 Jun 2024
Cascade Reward Sampling for Efficient Decoding-Time Alignment
Bolian Li
Yifan Wang
A. Grama
Ruqi Zhang
Ruqi Zhang
AI4TS
115
15
0
24 Jun 2024
Large Language Models Assume People are More Rational than We Really are
Ryan Liu
Jiayi Geng
Joshua C. Peterson
Ilia Sucholutsky
Thomas Griffiths
160
20
0
24 Jun 2024
CAVE: Controllable Authorship Verification Explanations
Sahana Ramnath
Kartik Pandey
Elizabeth Boschee
Xiang Ren
129
2
0
24 Jun 2024
PORT: Preference Optimization on Reasoning Traces
Salem Lahlou
Abdalgader Abubaker
Hakim Hacid
LRM
105
5
0
23 Jun 2024
Advantage Alignment Algorithms
Juan Agustin Duque
Milad Aghajohari
Tim Cooijmans
Tianyu Zhang
Rameswar Panda
Gauthier Gidel
Aaron Courville
61
2
0
20 Jun 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou
Teli Ma
Kun-Yu Lin
Ronghe Qiu
Zifan Wang
Junwei Liang
127
7
0
20 Jun 2024
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Wenkai Yang
Shiqi Shen
Guangyao Shen
Zhi Gong
Yankai Lin
Zhi Gong
Yankai Lin
Ji-Rong Wen
100
15
0
17 Jun 2024
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Yulai Zhao
Masatoshi Uehara
Gabriele Scalia
Tommaso Biancalani
Sergey Levine
Ehsan Hajiramezanali
Ehsan Hajiramezanali
AI4CE
134
7
0
17 Jun 2024
P-TA: Using Proximal Policy Optimization to Enhance Tabular Data Augmentation via Large Language Models
Shuo Yang
Chenchen Yuan
Yao Rong
Felix Steinbauer
Gjergji Kasneci
79
1
0
17 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
175
33
0
17 Jun 2024
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Yuan Pu
Yazhe Niu
Jiyuan Ren
Zhenjie Yang
Hongsheng Li
Yu Liu
OffRL
202
2
0
15 Jun 2024
DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning
Zeyu Gao
Yao Mu
Jinye Qu
Mengkang Hu
Lingyue Guo
Ping Luo
Yanfeng Lu
Ping Luo
Shanghang Zhang
Yanfeng Lu
119
11
0
14 Jun 2024
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
Sijia Chen
Yibo Wang
Yi-Feng Wu
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
Lijun Zhang
LLMAG
LRM
103
17
0
11 Jun 2024
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Yuzi Yan
Yibo Miao
J. Li
Yipin Zhang
Jian Xie
Zhijie Deng
Dong Yan
93
13
0
11 Jun 2024
GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model
Zhehua Zhou
Xuan Xie
Jiayang Song
Zhan Shu
Lei Ma
102
1
0
06 Jun 2024
Towards Dynamic Trend Filtering through Trend Point Detection with Reinforcement Learning
Jihyeon Seong
Sekwang Oh
Jaesik Choi
AI4TS
97
0
0
06 Jun 2024
Previous
1
2
3
...
10
11
12
13
9
Next