Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.06347
Cited By
v1
v2 (latest)
Proximal Policy Optimization Algorithms
20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Proximal Policy Optimization Algorithms"
50 / 8,597 papers shown
Title
GRaD-Nav: Efficiently Learning Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics
Qianzhong Chen
Jiankai Sun
Naixiang Gao
JunEn Low
Timothy Chen
Mac Schwager
135
1
0
01 Jul 2025
No Free Lunch: Rethinking Internal Feedback for LLM Reasoning
Yanzhi Zhang
Zhaoxi Zhang
Haoxiang Guan
Yilin Cheng
Yitong Duan
Chen Wang
Yue Wang
Shuxin Zheng
Jiyan He
ReLM
LRM
30
0
0
20 Jun 2025
Robust Dynamic Material Handling via Adaptive Constrained Evolutionary Reinforcement Learning
Chengpeng Hu
Ziming Wang
Bo Yuan
Jialin Liu
Chengqi Zhang
Xin Yao
10
0
0
20 Jun 2025
Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation
Kosuke Nakanishi
Akihiro Kubo
Yuji Yasui
Shin Ishii
AAML
OffRL
10
0
0
20 Jun 2025
Learning Accurate Whole-body Throwing with High-frequency Residual Policy and Pullback Tube Acceleration
Yuntao Ma
Yang Liu
Kaixian Qu
Marco Hutter
9
0
0
20 Jun 2025
Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators
Marco Jiralerspong
E. Derman
Danilo Vucetic
Nikolay Malkin
Bilun Sun
Tianyu Zhang
Pierre-Luc Bacon
Gauthier Gidel
OffRL
9
0
0
20 Jun 2025
Learning Dexterous Object Handover
Daniel Frau-Alfaro
Julio Castaño-Amorós
S. T. Puente
Pablo Gil
Roberto Calandra
10
0
0
20 Jun 2025
VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning
Zhangyang Qi
Zhixiong Zhang
Yizhou Yu
Jiaqi Wang
Hengshuang Zhao
LM&Ro
AI4TS
41
0
0
20 Jun 2025
Elevating Styled Mahjong Agents with Learning from Demonstration
Lingfeng Li
Yunlong Lu
Yongyi Wang
Wenxin Li
LLMAG
7
0
0
20 Jun 2025
RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought
Junbo Qiao
Miaomiao Cai
Wei Li
Y. Liu
X. Y. Huang
Gaoqi He
Jiao Xie
Jie Hu
X. Chen
Shaohui Lin
SupR
VLM
LRM
36
0
0
20 Jun 2025
Probing the Robustness of Large Language Models Safety to Latent Perturbations
Tianle Gu
Kexin Huang
Zongqi Wang
Yixu Wang
Jie Li
Yuanqi Yao
Yang Yao
Yujiu Yang
Yan Teng
Yingchun Wang
AAML
LLMSV
26
0
0
19 Jun 2025
KARL: Kalman-Filter Assisted Reinforcement Learner for Dynamic Object Tracking and Grasping
Kowndinya Boyalakuntla
Abdeslam Boularias
Jingjin Yu
12
0
0
19 Jun 2025
Goal-conditioned Hierarchical Reinforcement Learning for Sample-efficient and Safe Autonomous Driving at Intersections
Yiou Huang
5
0
0
19 Jun 2025
Investigating Lagrangian Neural Networks for Infinite Horizon Planning in Quadrupedal Locomotion
Prakrut Kotecha
Aditya Shirwatkar
Shishir Kolathaya
12
0
0
19 Jun 2025
GFlowGR: Fine-tuning Generative Recommendation Frameworks with Generative Flow Networks
Y. X. R. Wang
Shengyu Zhou
Jinyu Lu
Qidong Liu
Xinhang Li
...
Feng Li
Pengjie Wang
Jian Xu
Bo Zheng
Xiangyu Zhao
12
0
0
19 Jun 2025
Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
William Sharpless
Dylan Hirsch
S. Tonkens
Nikhil Shinde
Sylvia Herbert
7
0
0
19 Jun 2025
Data-Driven Policy Mapping for Safe RL-based Energy Management Systems
Theo Zangato
A. Osmani
Pegah Alizadeh
10
0
0
19 Jun 2025
EvoLM: In Search of Lost Language Model Training Dynamics
Zhenting Qi
Fan Nie
Alexandre Alahi
James Zou
Himabindu Lakkaraju
Yilun Du
Eric P. Xing
Sham Kakade
Hanlin Zhang
21
1
0
19 Jun 2025
Arch-Router: Aligning LLM Routing with Human Preferences
Co Tran
Salman Paracha
Adil Hafeez
Shuguang Chen
10
0
0
19 Jun 2025
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Zejun Ma
Chao Zhang
17
0
0
18 Jun 2025
Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study
Mohamad Abdul Hady
Siyi Hu
Mahardhika Pratama
Jimmy Cao
Ryszard Kowalczyk
12
0
0
18 Jun 2025
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
Feng He
Zijun Chen
Xinnian Liang
Tingting Ma
Yunqi Qiu
Shuangzhi Wu
Junchi Yan
LRM
60
0
0
18 Jun 2025
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
Zijian Zhou
Ao Qu
Zhaoxuan Wu
Sunghwan Kim
Alok Prakash
Daniela Rus
Jinhua Zhao
Bryan Kian Hsiang Low
Paul Liang
LLMAG
OffRL
LRM
10
0
0
18 Jun 2025
Efficient Navigation Among Movable Obstacles using a Mobile Manipulator via Hierarchical Policy Learning
Taegeun Yang
Jiwoo Hwang
Jeil Jeong
Minsung Yoon
Sung-Eui Yoon
36
0
0
18 Jun 2025
Steering Your Diffusion Policy with Latent Space Reinforcement Learning
Andrew Wagenmaker
Mitsuhiko Nakamoto
Yunchu Zhang
S. Park
Waleed Yagoub
Anusha Nagabandi
Abhishek Gupta
Sergey Levine
OffRL
15
0
0
18 Jun 2025
CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization
Ranting Hu
OffRL
25
0
0
18 Jun 2025
Sequential Policy Gradient for Adaptive Hyperparameter Optimization
Zheng Li
Jerry Q. Cheng
Huanying Gu
OffRL
12
0
0
18 Jun 2025
Quantum Fisher-Preconditioned Reinforcement Learning: From Single-Qubit Control to Rayleigh-Fading Link Adaptation
Oluwaseyi Giwa
Muhammad Ahmed Mohsin
Muhammad Ali Jamshed
OnRL
15
0
0
18 Jun 2025
AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning
Tevin Wang
Chenyan Xiong
LRM
32
0
0
18 Jun 2025
Truncated Proximal Policy Optimization
Tiantian Fan
L. J. Liu
Yu Yue
Jiaze Chen
C. Wang
...
Zhi-Li Zhang
Xin Liu
Mingxuan Wang
Lin Yan
Yonghui Wu
OffRL
LRM
12
0
0
18 Jun 2025
Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
Roger Creus Castanyer
J. Obando-Ceron
Lu Li
Pierre-Luc Bacon
Glen Berseth
Aaron Courville
Pablo Samuel Castro
19
0
0
18 Jun 2025
Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
Zongxia Li
Yapei Chang
Yuhang Zhou
Xiyang Wu
Zichao Liang
Yoo Yeon Sung
Jordan L. Boyd-Graber
19
0
0
18 Jun 2025
Booster Gym: An End-to-End Reinforcement Learning Framework for Humanoid Robot Locomotion
Yushi Wang
Penghui Chen
Xinyu Han
Feng Wu
Mingguo Zhao
OffRL
12
0
0
18 Jun 2025
RecBayes: Recurrent Bayesian Ad Hoc Teamwork in Large Partially Observable Domains
Joao G. Ribeiro
Yaniv Oren
Alberto Sardinha
M. Spaan
Francisco S. Melo
5
0
0
18 Jun 2025
Efficient and Generalizable Environmental Understanding for Visual Navigation
Ruoyu Wang
Xinshu Li
Chen Wang
Lina Yao
CML
9
0
0
18 Jun 2025
GMT: General Motion Tracking for Humanoid Whole-Body Control
Zixuan Chen
Mazeyu Ji
Xuxin Cheng
Xuanbin Peng
Xue Bin Peng
Xiaolong Wang
24
0
0
17 Jun 2025
Can Pretrained Vision-Language Embeddings Alone Guide Robot Navigation?
Nitesh Subedi
Adam Haroon
Shreyan Ganguly
Samuel T.K. Tetteh
Prajwal Koirala
Cody Fleming
Soumik Sarkar
LM&Ro
35
0
0
17 Jun 2025
Reasoning with Exploration: An Entropy Perspective
Daixuan Cheng
Shaohan Huang
Xuekai Zhu
Bo Dai
Wayne Xin Zhao
Zhenliang Zhang
Furu Wei
LRM
17
0
0
17 Jun 2025
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
Xumeng Wen
Zihan Liu
Shun Zheng
Zhijian Xu
Shengyu Ye
...
Yang Wang
Junjie Li
Ziming Miao
Jiang Bian
Mao Yang
LRM
24
0
0
17 Jun 2025
Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent
Xueyang Feng
Jingsen Zhang
Jiakai Tang
Wei Li
Guohao Cai
X. Chen
Quanyu Dai
Y. Zhu
Zhenhua Dong
17
0
0
17 Jun 2025
Human-Centered Editable Speech-to-Sign-Language Generation via Streaming Conformer-Transformer and Resampling Hook
Yingchao Li
SLR
39
0
0
17 Jun 2025
ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes
Zeyuan Chen
Qiyang Yan
Yuanpei Chen
Tianhao Wu
Jiyao Zhang
Zihan Ding
Jinzhou Li
Yaodong Yang
Hao Dong
15
0
0
17 Jun 2025
Common Benchmarks Undervalue the Generalization Power of Programmatic Policies
Amirhossein Rajabpour
Kiarash Aghakasiri
Sandra Zilles
Levi H. S. Lelis
OffRL
20
0
0
17 Jun 2025
Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs
Gyutaek Oh
Seoyeon Kim
Sangjoon Park
Byung-Hoon Kim
LM&MA
LRM
19
0
0
16 Jun 2025
Scaling Algorithm Distillation for Continuous Control with Mamba
Samuel Beaussant
Mehdi Mounsif
17
0
0
16 Jun 2025
Dynamic Preference Multi-Objective Reinforcement Learning for Internet Network Management
DongNyeong Heo
Daniela N. Rim
Heeyoul Choi
17
0
0
16 Jun 2025
Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning
David Bani-Harouni
Chantal Pellegrini
Ege Özsoy
Matthias Keicher
Nassir Navab
LLMAG
LM&MA
17
0
0
16 Jun 2025
BOW: Bottlenecked Next Word Exploration
Ming shen
Zhikun Xu
Xiao Ye
Jacob Dineen
Ben Zhou
OffRL
LRM
19
0
0
16 Jun 2025
Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks
Yifei Xu
Tusher Chakraborty
Srinagesh Sharma
Leonardo Nunes
Emre Kıcıman
Songwu Lu
Ranveer Chandra
OffRL
LRM
18
1
0
16 Jun 2025
Dynamic Reinsurance Treaty Bidding via Multi-Agent Reinforcement Learning
Stella C. Dong
James R. Finlay
7
0
0
16 Jun 2025
1
2
3
4
...
170
171
172
Next