Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.06347
Cited By
Proximal Policy Optimization Algorithms
20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Proximal Policy Optimization Algorithms"
50 / 6,857 papers shown
Title
Surrogate Fitness Metrics for Interpretable Reinforcement Learning
Philipp Altmann
Céline Davignon
Maximilian Zorn
Fabian Ritz
Claudia Linnhoff-Popien
Thomas Gabor
29
0
0
20 Apr 2025
Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension
Lin Li
Wei Chen
Jiahui Li
Lu Chen
LRM
48
1
0
20 Apr 2025
Deep Reinforcement Learning for Investor-Specific Portfolio Optimization: A Volatility-Guided Asset Selection Approach
Arishi Orra
Aryan Bhambu
Himanshu Choudhary
Manoj Thakur
Selvaraju Natarajan
14
0
0
20 Apr 2025
Exposing the Copycat Problem of Imitation-based Planner: A Novel Closed-Loop Simulator, Causal Benchmark and Joint IL-RL Baseline
Hui Zhou
Shaoshuai Shi
Hongsheng Li
OffRL
32
0
0
20 Apr 2025
Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction
Wenke Xia
Ruoxuan Feng
Dong Wang
Di Hu
32
0
0
20 Apr 2025
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
Jiyuan Shi
Xinzhe Liu
Dewei Wang
Ouyang Lu
Sören Schwertfeger
Fuchun Sun
Chenjia Bai
X. Li
47
0
0
19 Apr 2025
HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation
Jiakai Tang
Jingsen Zhang
Zihang Tian
Xueyang Feng
Lei Wang
Xu Chen
OffRL
192
0
0
19 Apr 2025
Optimal Lattice Boltzmann Closures through Multi-Agent Reinforcement Learning
Paul Fischer
Sebastian Kaltenbach
Sergey Litvinov
Sauro Succi
Petros Koumoutsakos
AI4CE
31
0
0
19 Apr 2025
Improving RL Exploration for LLM Reasoning through Retrospective Replay
Shihan Dou
Muling Wu
Jingwen Xu
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OffRL
LRM
32
0
0
19 Apr 2025
Quantum-Enhanced Reinforcement Learning for Power Grid Security Assessment
Benjamin M. Peter
Mert Korkali
31
0
0
19 Apr 2025
Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Shouwei Ruan
Zhenyu Wu
Yao Huang
Ruochen Zhang
Yitong Sun
Caixin Kang
Xingxing Wei
EGVM
53
0
0
19 Apr 2025
Direct Advantage Regression: Aligning LLMs with Online AI Reward
Li He
He Zhao
Stephen Wan
Dadong Wang
Lina Yao
Tongliang Liu
38
0
0
19 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLM
LRM
61
21
0
18 Apr 2025
Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning
R. P. Singh
M. Morisawa
M. Benallegue
Zhaoming Xie
F. Kanehiro
27
0
0
18 Apr 2025
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Yixuan Even Xu
Yash Savani
Fei Fang
Zico Kolter
OffRL
42
2
0
18 Apr 2025
Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots
Zhengzhang Chen
Yan Xia
Jiayuan Liu
Jijia Liu
Wenhao Tang
...
Hongen Liao
Yu-Ping Wang
Chao Yu
Boyu Zhang
Fei Xing
26
1
0
18 Apr 2025
Compile Scene Graphs with Reinforcement Learning
Zuyao Chen
Jinlin Wu
Zhen Lei
Marc Pollefeys
Chang Wen Chen
OffRL
LRM
57
0
0
18 Apr 2025
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Jiliang Ni
Jiachen Pu
Zhongyi Yang
Kun Zhou
Hui Wang
Xiaoliang Xiao
Dakui Wang
Xin Li
Jingfeng Luo
Conggang Hu
39
0
0
18 Apr 2025
Imperative MPC: An End-to-End Self-Supervised Learning with Differentiable MPC for UAV Attitude Control
Haonan He
Yuheng Qiu
Junyi Geng
86
0
0
17 Apr 2025
Recursive Deep Inverse Reinforcement Learning
Paul Ghanem
Michael Potter
Owen Howell
Pau Closas
A. Ramezani
Deniz Erdogmus
Tales Imbiriba
32
0
0
17 Apr 2025
CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation
Elahe Khatibi
Ziyu Wang
Amir M. Rahmani
48
0
0
17 Apr 2025
Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
Tyler Ga Wei Lum
Olivia Y. Lee
C. Karen Liu
Jeannette Bohg
45
1
0
17 Apr 2025
Aligning Constraint Generation with Design Intent in Parametric CAD
Evan Casey
Tianyu Zhang
Shu Ishida
John Roger Thompson
Amir Hosein Khasahmadi
Joseph George Lambourne
P. Jayaraman
K. Willis
38
0
0
17 Apr 2025
Evolutionary Policy Optimization
Zelal Su "Lain" Mustafaoglu
Keshav Pingali
Risto Miikkulainen
31
0
0
17 Apr 2025
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab
Ruqi Zhang
188
0
0
17 Apr 2025
TraCeS: Trajectory Based Credit Assignment From Sparse Safety Feedback
Siow Meng Low
Akshat Kumar
48
0
0
17 Apr 2025
Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
Xiaotian Zhang
Ruizhe Chen
Yang Feng
Zuozhu Liu
45
0
0
17 Apr 2025
Science-T2I: Addressing Scientific Illusions in Image Synthesis
Jialuo Li
Wenhao Chai
Xingyu Fu
Haiyang Xu
Saining Xie
MedIm
45
0
0
17 Apr 2025
UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control
Yunsheng Wu
Korrawe Karunratanakul
Zhengyi Luo
Siyu Tang
DiffM
VGen
AI4CE
52
0
0
17 Apr 2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu
Jinjie Ni
Zijian Wu
Chao Du
Longxu Dou
Haoran Wang
Tianyu Pang
Michael Shieh
OffRL
LRM
194
1
0
17 Apr 2025
Evolutionary Reinforcement Learning for Interpretable Decision-Making in Supply Chain Management
Stefano Genetti
Alberto Longobardi
Giovanni Iacca
55
0
0
16 Apr 2025
Evaluating the Diversity and Quality of LLM Generated Content
Alexander Shypula
Shuo Li
Botong Zhang
Vishakh Padmakumar
Kayo Yin
Osbert Bastani
53
1
0
16 Apr 2025
A Graph-Based Reinforcement Learning Approach with Frontier Potential Based Reward for Safe Cluttered Environment Exploration
Gabriele Calzolari
Vidya Sumathy
Christoforos Kanellakis
G. Nikolakopoulos
201
0
0
16 Apr 2025
pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild
Jonas Myhre Schiøtt
Viktor Sebastian Petersen
Dimitrios P. Papadopoulos
VLM
35
0
0
16 Apr 2025
ToolRL: Reward is All Tool Learning Needs
Cheng Qian
Emre Can Acikgoz
Qi He
Hongru Wang
Xiusi Chen
Dilek Hakkani-Tur
Gokhan Tur
Heng Ji
OffRL
LRM
38
7
0
16 Apr 2025
GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision
Zihui Zhang
Yafei Yang
Hongtao Wen
Bo Yang
3DPC
45
0
0
16 Apr 2025
R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors
Haoyang Wang
Liming Liu
Peiheng Wang
Junlin Hao
Jiangkai Wu
Xinggong Zhang
26
0
0
16 Apr 2025
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection
Yuhao Chao
Jie Liu
J. Tang
Gangshan Wu
37
1
0
16 Apr 2025
Control of Rayleigh-Bénard Convection: Effectiveness of Reinforcement Learning in the Turbulent Regime
Thorben Markmann
Michiel Straat
Sebastian Peitz
Barbara Hammer
AI4CE
38
0
0
16 Apr 2025
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao
Devaansh Gupta
Qinqing Zheng
Aditya Grover
DiffM
LRM
AI4CE
50
2
0
16 Apr 2025
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar
Ali Etemad
38
0
0
16 Apr 2025
Dynamic Compressing Prompts for Efficient Inference of Large Language Models
Jinwu Hu
Feiyu Xiong
Yufeng Wang
Yu Hu
Bin Xiao
Mingkui Tan
Qing Du
31
1
0
15 Apr 2025
Data driven approach towards more efficient Newton-Raphson power flow calculation for distribution grids
Shengyuan Yan
Farzad Vazinram
Zeynab Kaseb
Lindsay Spoor
Jochen Stiasny
...
Amirhossein Heydarian Ardakani
Ugochukwu Orji
Pedro P. Vergara
Yu Xiang
Jerry Guo
39
1
0
15 Apr 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
Junke Wang
Zhi Tian
Xueliang Wang
Xinyu Zhang
Weilin Huang
Zuxuan Wu
Yu Jiang
VGen
67
6
0
15 Apr 2025
Measures of Variability for Risk-averse Policy Gradient
Yudong Luo
Yangchen Pan
Jiaqi Tan
Pascal Poupart
45
0
0
15 Apr 2025
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Wei Xiong
Jiarui Yao
Yuhui Xu
Bo Pang
Lei Wang
...
Junnan Li
Nan Jiang
Tong Zhang
Caiming Xiong
Hanze Dong
OffRL
LRM
48
10
0
15 Apr 2025
A Rollout-Based Algorithm and Reward Function for Efficient Resource Allocation in Business Processes
Jeroen Middelhuis
Z. Bukhsh
Ivo Adan
R. Dijkman
29
0
0
15 Apr 2025
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Divyansh Garg
Shaun VanWeelden
Diego Caples
Andis Draguns
Nikil Ravi
...
Youngchul Joo
Jindong Gu
Charles London
Christian Schroeder de Witt
S. Motwani
47
1
0
15 Apr 2025
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Jiazhan Feng
Shijue Huang
Xingwei Qu
Ge Zhang
Yujia Qin
Baoquan Zhong
Chengquan Jiang
Jinxin Chi
Wanjun Zhong
OffRL
ReLM
SyDa
KELM
LRM
59
8
0
15 Apr 2025
Deep Reasoning Translation via Reinforcement Learning
Jiaan Wang
Fandong Meng
Jie Zhou
OffRL
LRM
33
0
0
14 Apr 2025
Previous
1
2
3
...
6
7
8
...
136
137
138
Next