Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.06347
Cited By
Proximal Policy Optimization Algorithms
20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Proximal Policy Optimization Algorithms"
50 / 6,731 papers shown
Title
DYSTIL: Dynamic Strategy Induction with Large Language Models for Reinforcement Learning
Borui Wang
Kathleen McKeown
Rex Ying
OffRL
39
0
0
06 May 2025
Joint Resource Management for Energy-efficient UAV-assisted SWIPT-MEC: A Deep Reinforcement Learning Approach
Yue Chen
Hui Kang
Jiahui Li
Geng Su
Boxiong Wang
Jiacheng Wang
Cong Liang
Shuang Liang
Dusit Niyato
49
0
0
06 May 2025
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
Qianchu Liu
Sheng Zhang
Guanghui Qin
Timothy Ossowski
Yu Gu
...
Sam Preston
Mu-Hsin Wei
Paul Vozila
Tristan Naumann
Hoifung Poon
OOD
LRM
VLM
59
1
0
06 May 2025
PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers
Michael Xu
Yi Shi
KangKang Yin
Xue Bin Peng
33
0
0
06 May 2025
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Y. Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
185
1
0
05 May 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao
Yifan Hao
Hanning Zhang
Hanze Dong
Wei Xiong
Nan Jiang
Tong Zhang
LRM
62
0
0
05 May 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
Jianfei Chen
Fan Yang
Z. Zhang
Tingting Gao
Liang Wang
OffRL
LRM
46
0
0
05 May 2025
Bielik 11B v2 Technical Report
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
34
0
0
05 May 2025
Automated Hybrid Reward Scheduling via Large Language Models for Robotic Skill Learning
Changxin Huang
Junyang Liang
Yanbin Chang
Jingzhao Xu
Jianqiang Li
34
0
0
05 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
72
1
0
05 May 2025
AKD : Adversarial Knowledge Distillation For Large Language Models Alignment on Coding tasks
Ilyas Oulkadda
Julien Perez
ALM
47
0
0
05 May 2025
Enhancing LLMs' Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry
J. Kim
Chaeeun Shim
Sungjin Park
Su Yeon Lee
Gee Young Suh
...
Yong Soo Kim
Hee-Joon Bae
Sung Yoon Lim
Han-Gil Jeong
Edward Choi
LRM
51
0
0
05 May 2025
TWIST: Teleoperated Whole-Body Imitation System
Yanjie Ze
Zixuan Chen
Joao Pedro Araujo
Zi-ang Cao
Xue Bin Peng
Jiajun Wu
Chao Liu
36
1
0
05 May 2025
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
39
0
0
05 May 2025
Knowing You Don't Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing
Diji Yang
Linda Zeng
Jinmeng Rao
Yuyao Zhang
32
0
0
05 May 2025
Aerodynamic and structural airfoil shape optimisation via Transfer Learning-enhanced Deep Reinforcement Learning
David Ramos
Lucas Lacasa
E. Valero
G. Rubio
AI4CE
27
0
0
05 May 2025
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
Tianjian Li
Daniel Khashabi
60
0
0
05 May 2025
Prompt-responsive Object Retrieval with Memory-augmented Student-Teacher Learning
Malte Mosbach
Sven Behnke
31
0
0
04 May 2025
Interpretable Emergent Language Using Inter-Agent Transformers
Mannan Bhardwaj
AI4CE
120
0
0
04 May 2025
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Jiancong Xiao
Bojian Hou
Zhanliang Wang
Ruochen Jin
Q. Long
Weijie Su
Li Shen
35
0
0
04 May 2025
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study
Xiaoyu Tian
Sitong Zhao
Haotian Wang
Shuaiting Chen
Yiping Peng
Yunjie Ji
Han Zhao
Xiangang Li
OffRL
LRM
37
0
0
04 May 2025
SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations
Runyi Yu
Yinhuai Wang
Qihan Zhao
Hok Wai Tsui
Jingbo Wang
P. Tan
Qifeng Chen
OffRL
36
0
0
04 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
60
2
0
04 May 2025
Resolving Conflicting Constraints in Multi-Agent Reinforcement Learning with Layered Safety
Jason J. Choi
Jasmine Jerry Aloor
Jingqi Li
Maria G. Mendoza
H. Balakrishnan
Claire J. Tomlin
31
0
0
04 May 2025
A Generalised and Adaptable Reinforcement Learning Stopping Method
Reem Bin-Hezam
Mark Stevenson
29
0
0
03 May 2025
CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation
Mazal Bethany
Nishant Vishwamitra
Cho-Yu Chiang
Peyman Najafirad
AAML
31
0
0
03 May 2025
Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning
Jifeng Hu
Sili Huang
Zhengyuan Yang
Shengchao Hu
Li Shen
H. Chen
Lichao Sun
Yi-Ju Chang
Dacheng Tao
OffRL
179
0
0
03 May 2025
Adaptive Wizard for Removing Cross-Tier Misconfigurations in Active Directory
Huy Q. Ngo
Mingyu Guo
Hung Nguyen
AAML
31
0
0
02 May 2025
Fast Flow-based Visuomotor Policies via Conditional Optimal Transport Couplings
Andreas Sochopoulos
Nikolay Malkin
Nikolaos Tsagkas
João Moura
Michael Gienger
S. Vijayakumar
50
1
0
02 May 2025
Model Tensor Planning
An T. Le
K. Nguyen
Minh Nhat Vu
João Carvalho
Jan Peters
35
0
0
02 May 2025
Wasserstein Policy Optimization
David Pfau
Ian Davies
Diana Borsa
Joao G. M. Araujo
Brendan D. Tracey
H. V. Hasselt
29
0
0
01 May 2025
A General Approach of Automated Environment Design for Learning the Optimal Power Flow
Thomas Wolgast
Astrid Nieße
AI4CE
21
0
0
01 May 2025
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI
Lik Hang Kenny Wong
Xueyang Kang
Kaixin Bai
Jianwei Zhang
59
0
0
01 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
Yiming Li
LRM
72
2
0
01 May 2025
MULE: Multi-terrain and Unknown Load Adaptation for Effective Quadrupedal Locomotion
Vamshi Kumar Kurva
Shishir Kolathaya
26
0
0
01 May 2025
Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks
Xinyu Wang
Jinbo Bi
Minghu Song
CLL
69
0
0
01 May 2025
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
Lang Feng
Weihao Tan
Zhiyi Lyu
Longtao Zheng
Haiyang Xu
M. Yan
Fei Huang
Jingyi Wang
29
0
0
01 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
58
1
0
30 Apr 2025
Whleaper: A 10-DOF Flexible Bipedal Wheeled Robot
Yinglei Zhu
Sixiao He
Zhenghao Qi
Zhuoyuan Yong
Yihua Qin
Jianyu Chen
29
0
0
30 Apr 2025
Designing Control Barrier Function via Probabilistic Enumeration for Safe Reinforcement Learning Navigation
Luca Marzari
Francesco Trotti
Enrico Marchesini
Alessandro Farinelli
48
0
0
30 Apr 2025
Neuro-Symbolic Generation of Explanations for Robot Policies with Weighted Signal Temporal Logic
Mikihisa Yuasa
R. Sreenivas
Huy T. Tran
42
0
0
30 Apr 2025
One Net to Rule Them All: Domain Randomization in Quadcopter Racing Across Different Platforms
Robin Ferede
Till Blaha
Erin Lucassen
Christophe De Wagter
Guido de Croon
34
0
0
30 Apr 2025
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Haoran Xu
Baolin Peng
Hany Awadalla
Dongdong Chen
Yen-Chun Chen
...
Yelong Shen
S. Wang
Weijian Xu
Jianfeng Gao
Weizhu Chen
ReLM
LRM
75
1
0
30 Apr 2025
LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning
Yiyang Shao
Xiaoyu Huang
Bike Zhang
Qiayuan Liao
Yuman Gao
Yufeng Chi
Zhongyu Li
Sophia Shao
K. Sreenath
LM&Ro
190
0
0
30 Apr 2025
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
Jingyang Yi
Jiazheng Wang
Sida Li
ReLM
OODD
LRM
174
2
0
30 Apr 2025
Adaptive 3D UI Placement in Mixed Reality Using Deep Reinforcement Learning
Feiyu Lu
Mengyu Chen
Hsiang Hsu
Pranav Deshpande
Cheng Yao Wang
Blair MacIntyre
32
3
0
30 Apr 2025
A Domain-Agnostic Scalable AI Safety Ensuring Framework
Beomjun Kim
Kangyeon Kim
Sunwoo Kim
Heejin Ahn
57
0
0
29 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
125
5
0
29 Apr 2025
XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search
Yiting Zhang
Shichen Li
Elena Shrestha
40
0
0
29 Apr 2025
Multi-Agent Reinforcement Learning for Resources Allocation Optimization: A Survey
Mohamad Abdul Hady
Siyi Hu
Mahardhika Pratama
Jimmy Cao
Ryszard Kowalczyk
24
0
0
29 Apr 2025
Previous
1
2
3
4
5
...
133
134
135
Next