Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.06347
Cited By
v1
v2 (latest)
Proximal Policy Optimization Algorithms
20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Proximal Policy Optimization Algorithms"
50 / 626 papers shown
Title
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
Georgiy Malaniya
Anton Bolychev
Grigory Yaremenko
Anastasia Krasnaya
Pavel Osinenko
109
0
0
18 May 2025
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLM
LRM
116
3
0
17 May 2025
CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction
Jing Zou
Qingqiu Li
Chenyu Lian
Lihao Liu
Xiaohan Yan
Shujun Wang
Jing Qin
VLM
156
0
0
17 May 2025
JULI: Jailbreak Large Language Models by Self-Introspection
Jesson Wang
Zhanhao Hu
David Wagner
108
0
0
17 May 2025
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
Kehan Long
Jorge Cortés
Nikolay Atanasov
103
1
0
16 May 2025
Tool-Aided Evolutionary LLM for Generative Policy Toward Efficient Resource Management in Wireless Federated Learning
Chongyang Tan
Ruoqi Wen
Rongpeng Li
Zhifeng Zhao
Ekram Hossain
Honggang Zhang
110
0
0
16 May 2025
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
Yaorui Shi
Shihan Li
Chang Wu
Zhiyuan Liu
Sihang Li
Hengxing Cai
An Zhang
Xiang Wang
ReLM
LRM
143
0
0
16 May 2025
Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition
Bo Yue
Shuqi Guo
Kaiyu Hu
Chujiao Wang
Benyou Wang
Kui Jia
Guiliang Liu
LRM
104
0
0
16 May 2025
ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
Wenhao Shen
Wanqi Yin
Xiaofeng Yang
Cheng Chen
Chaoyue Song
Zhongang Cai
Lei Yang
Hao Wang
Guosheng Lin
129
0
0
15 May 2025
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang
Zhiyang Chen
Zijun Wang
Tiancheng Li
Guo-Jun Qi
DiffM
LRM
AI4CE
87
2
0
15 May 2025
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
Yunjie Ji
Xiaoyu Tian
Sitong Zhao
Haotian Wang
Shuaiting Chen
Yiping Peng
Han Zhao
Xiangang Li
ReLM
LRM
VLM
137
3
0
13 May 2025
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
Teng Xiao
Zhen Ge
Sujay Sanghavi
Tian Wang
Julian Katz-Samuels
Marc Versage
Qingjun Cui
Trishul Chilimbi
177
1
0
13 May 2025
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Rei Higuchi
Taiji Suzuki
103
1
0
12 May 2025
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai
Haotian Xu
X. Wu
Weinong Wang
Yingying Zhang
Wenqiang Zhang
ReLM
LRM
141
3
0
12 May 2025
X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real
Prithwish Dan
Kushal Kedia
Angela Chao
Edward Weiyi Duan
Maximus Adrian Pace
Wei-Chiu Ma
Sanjiban Choudhury
90
0
0
11 May 2025
References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
Doyoung Kim
Youngjun Lee
Joeun Kim
Jihwan Bang
Hwanjun Song
Susik Yoon
Jae-Gil Lee
199
0
0
10 May 2025
Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Representation Learning
Hang Gao
Chenhao Zhang
Tie Wang
Junsuo Zhao
Fengge Wu
Changwen Zheng
Huaping Liu
LRM
186
0
0
09 May 2025
VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction
Noah Frahm
Dongxu Zhao
Andrea Dunn Beltran
Ron Alterovitz
Jan-Michael Frahm
Junier Oliva
Roni Sengupta
485
0
0
09 May 2025
Let Humanoids Hike! Integrative Skill Development on Complex Trails
Kwan-Yee Lin
Stella X.Yu
97
0
0
09 May 2025
Scalable Chain of Thoughts via Elastic Reasoning
Yuhui Xu
Hanze Dong
Lei Wang
Doyen Sahoo
Junnan Li
Caiming Xiong
OffRL
LRM
110
8
0
08 May 2025
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu
Gongye Liu
Jiajun Liang
Yongqian Li
Jiaheng Liu
Xinyu Wang
Pengfei Wan
Di Zhang
Wanli Ouyang
AI4CE
186
5
0
08 May 2025
Guide your favorite protein sequence generative model
Junhao Xiong
Hunter Nisonoff
Maria Lukarska
Ishan Gaur
Luke M. Oltrogge
David F. Savage
Jennifer Listgarten
DiffM
229
2
0
07 May 2025
On-Device LLM for Context-Aware Wi-Fi Roaming
Ju-Hyung Lee
Yanqing Lu
Klaus Doppler
100
0
0
07 May 2025
Joint Resource Management for Energy-efficient UAV-assisted SWIPT-MEC: A Deep Reinforcement Learning Approach
Yue Chen
Hui Kang
Jiahui Li
Geng Sun
Boxiong Wang
Jiacheng Wang
Cong Liang
Shuang Liang
Dusit Niyato
211
0
0
06 May 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao
Yifan Hao
Hanning Zhang
Hanze Dong
Wei Xiong
Nan Jiang
Tong Zhang
LRM
153
2
0
05 May 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
Jianfei Chen
Fan Yang
Zheng Zhang
Yan Li
Liang Wang
OffRL
LRM
114
6
0
05 May 2025
Bielik 11B v2 Technical Report
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
97
0
0
05 May 2025
Whleaper: A 10-DOF Flexible Bipedal Wheeled Robot
Yinglei Zhu
Sixiao He
Zhenghao Qi
Zhuoyuan Yong
Yihua Qin
Jianyu Chen
64
0
0
30 Apr 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
184
8
0
30 Apr 2025
Adaptive 3D UI Placement in Mixed Reality Using Deep Reinforcement Learning
Feiyu Lu
Mengyu Chen
Hsiang Hsu
Pranav Deshpande
Cheng Yao Wang
Blair MacIntyre
84
4
0
30 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
Liu Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon Shaolei Du
Yelong Shen
OffRL
ReLM
LRM
299
47
0
29 Apr 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
Xuzhao Li
MLLM
237
0
0
29 Apr 2025
Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation
Harry Mead
Clarissa Costen
Bruno Lacerda
Nick Hawes
123
0
0
29 Apr 2025
Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets
Adam Younsi
Abdalgader Abubaker
M. Seddik
Hakim Hacid
Salem Lahlou
LRM
215
1
0
28 Apr 2025
Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments
Yun Qu
Wenjie Wang
Yixiu Mao
Yiqin Lv
Xiangyang Ji
TTA
161
0
0
27 Apr 2025
Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers
Shuang Wang
Haoyang Zhang
Tianxing Wu
Yize Zhang
W. Zhang
Quan Z. Sheng
63
0
0
27 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
Xuzhao Li
Kwan-Yee K. Wong
LLMAG
ReLM
LRM
152
5
0
27 Apr 2025
Hierarchical Reinforcement Learning in Multi-Goal Spatial Navigation with Autonomous Mobile Robots
Brendon Johnson
Alfredo Weitzenfeld
119
1
0
26 Apr 2025
Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control
Haoran Wang
Zhiwei Shi
Chengxi Zhu
Yafei Qiao
Cheng Zhang
Fan Yang
Pengjie Ren
Lan Lu
D. Xuan
126
1
0
24 Apr 2025
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Zihan Wang
Kaidi Wang
Q. Wang
Pingyue Zhang
Linjie Li
...
Jiajun Wu
L. Fei-Fei
Lijuan Wang
Yejin Choi
Manling Li
228
30
0
24 Apr 2025
Evolution Meets Diffusion: Efficient Neural Architecture Generation
Bingye Zhou
Caiyang Yu
DiffM
159
0
0
24 Apr 2025
High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures
A. J Miller
Fangzhou Yu
Michael Brauckmann
Farbod Farshidian
OffRL
BDL
98
1
0
24 Apr 2025
A Comprehensive Survey of Synthetic Tabular Data Generation
Ruxue Shi
Yili Wang
Mengnan Du
Xu Shen
Xin Wang
231
2
0
23 Apr 2025
Autonomous Control of Redundant Hydraulic Manipulator Using Reinforcement Learning with Action Feedback
Rohit Dhakate
Christian Brommer
C. Böhm
Stephan Weiss
J. Steinbrener
66
5
0
22 Apr 2025
CaRoSaC: A Reinforcement Learning-Based Kinematic Control of Cable-Driven Parallel Robots by Addressing Cable Sag through Simulation
Rohit Dhakate
Thomas Jantos
Eren Allak
Stephan Weiss
J. Steinbrener
76
0
0
22 Apr 2025
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Li Sheng
Li Sheng
Xuekai Zhu
...
Youbang Sun
Zhiyuan Ma
Lifan Yuan
Ning Ding
Bowen Zhou
OffRL
407
31
0
22 Apr 2025
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
130
17
0
21 Apr 2025
Improving Human-AI Coordination through Adversarial Training and Generative Models
Paresh Chaudhary
Yancheng Liang
Daphne Chen
S. Du
Natasha Jaques
122
1
0
21 Apr 2025
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
467
0
0
21 Apr 2025
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
OffRL
147
5
0
21 Apr 2025
Previous
1
2
3
4
5
...
11
12
13
Next