Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.06347
Cited By
v1
v2 (latest)
Proximal Policy Optimization Algorithms
20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Proximal Policy Optimization Algorithms"
50 / 8,601 papers shown
Title
Two-Stage Feature Generation with Transformer and Reinforcement Learning
Wanfu Gao
Zengyao Man
Zebin He
Yuhao Tang
Jun Gao
Kunpeng Liu
16
0
0
28 May 2025
Contraction Actor-Critic: Contraction Metric-Guided Reinforcement Learning for Robust Path Tracking
Minjae Cho
Hiroyasu Tsukamoto
Huy Trong Tran
7
0
0
28 May 2025
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
Hongyi Zhou
Josiah P. Hanna
Jin Zhu
Ying Yang
Chengchun Shi
OffRL
56
0
0
28 May 2025
Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack
Juan Ren
Mark Dras
Usman Naseem
AAML
74
0
0
28 May 2025
FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control
Younggyo Seo
Carmelo Sferrazza
Haoran Geng
Michal Nauman
Zhao-Heng Yin
Pieter Abbeel
OffRL
63
0
0
28 May 2025
When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks?
Eleni Nisioti
J. Pedersen
Erwan Plantec
Milton L. Montero
S. Risi
OffRL
32
0
0
28 May 2025
Oryx: a Performant and Scalable Algorithm for Many-Agent Coordination in Offline MARL
Claude Formanek
Omayma Mahjoub
Louay Ben Nessir
Sasha Abramowitz
Ruan de Kock
...
Daniel Rajaonarivonivelomanantsoa
Arnol Fokam
Siddarth S. Singh
Ulrich A. Mbou Sob
Arnu Pretorius
OffRL
36
0
0
28 May 2025
Maximizing Confidence Alone Improves Reasoning
Mihir Prabhudesai
Lili Chen
Alex Ippoliti
Katerina Fragkiadaki
Hao Liu
Deepak Pathak
OOD
OffRL
ReLM
LRM
122
3
0
28 May 2025
HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym
Ngoc La
Ruaridh Mon-Williams
Julie A. Shah
14
0
0
28 May 2025
Reward-Independent Messaging for Decentralized Multi-Agent Reinforcement Learning
Naoto Yoshida
Tadahiro Taniguchi
24
0
0
28 May 2025
SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning
Mattie Fellows
Clarisse Wibault
Uljad Berdica
Johannes Forkel
Jakob Foerster
Michael A. Osborne
OffRL
OnRL
60
0
0
28 May 2025
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Jiaqi Huang
Zunnan Xu
Jun Zhou
Ting Liu
Yicheng Xiao
Mingwen Ou
Bowen Ji
Xiu Li
Kehong Yuan
VLM
89
0
0
28 May 2025
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
Tian Qin
Core Francisco Park
Mujin Kwun
Aaron Walsman
Eran Malach
Nikhil Anand
Hidenori Tanaka
David Alvarez-Melis
ReLM
OffRL
LRM
79
0
0
28 May 2025
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Tonghe Zhang
Chao Yu
Sichang Su
Yu Wang
61
0
0
28 May 2025
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Hanting Chen
Yasheng Wang
Kai Han
Dong Li
Lin Li
...
Hailin Hu
Yehui Tang
Dacheng Tao
Xinghao Chen
Yunhe Wang
LRM
93
0
0
28 May 2025
Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data
Christopher Lee Lübbers
15
0
0
28 May 2025
Training Language Models to Generate Quality Code with Program Analysis Feedback
Feng Yao
Zilong Wang
Liyuan Liu
Junxia Cui
Li Zhong
Xiaohan Fu
Haohui Mai
Vish Krishnan
Jianfeng Gao
Jingbo Shang
52
0
0
28 May 2025
How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective
Shimao Zhang
Z. Lai
Xiang Liu
Shuaijie She
Xiao Liu
Yeyun Gong
Shujian Huang
Jiajun Chen
38
0
0
27 May 2025
SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation
Ting Xu
Zhichao Huang
Jiankai Sun
Shanbo Cheng
Wai Lam
OffRL
17
0
0
27 May 2025
Efficient Controllable Diffusion via Optimal Classifier Guidance
Owen Oertell
Shikun Sun
Yiding Chen
Jin Peng Zhou
Zhiyong Wang
Wen Sun
43
0
0
27 May 2025
RRO: LLM Agent Optimization Through Rising Reward Trajectories
Zilong Wang
Jingfeng Yang
Sreyashi Nag
Samarth Varshney
Xianfeng Tang
Haoming Jiang
Jingbo Shang
Sheikh Sarwar
LRM
42
0
0
27 May 2025
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley
Mingyu Chen
Zhaolin Gao
Jason D. Lee
Wen Sun
Wenhao Zhan
Xuezhou Zhang
OffRL
LRM
77
1
0
27 May 2025
Generalized Coordination of Partially Cooperative Urban Traffic
Max Bastian Mertens
M. Buchholz
10
0
0
27 May 2025
TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs
Zhehan Kan
Y. Liu
Kun Yin
Xinghua Jiang
Xin Li
...
Yinsong Liu
D. Jiang
Xing Sun
Qingmin Liao
Wenming Yang
LRM
76
0
0
27 May 2025
An Optimisation Framework for Unsupervised Environment Design
Nathan Monette
Alistair Letcher
Michael Beukman
Matthew Jackson
Alexander Rutherford
Alexander David Goldie
Jakob N. Foerster
65
0
0
27 May 2025
Learning Unified Force and Position Control for Legged Loco-Manipulation
Peiyuan Zhi
Peiyang Li
Jianqin Yin
Baoxiong Jia
Siyuan Huang
91
1
0
27 May 2025
Square
χ
χ
χ
PO: Differentially Private and Robust
χ
2
χ^2
χ
2
-Preference Optimization in Offline Direct Alignment
Xingyu Zhou
Yulian Wu
Wenqian Weng
Francesco Orabona
75
0
0
27 May 2025
Improved Representation Steering for Language Models
Zhengxuan Wu
Qinan Yu
Aryaman Arora
Christopher D. Manning
Christopher Potts
LLMSV
76
0
0
27 May 2025
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
Mingyang Song
Mao Zheng
OffRL
LRM
89
1
0
27 May 2025
EasyDistill: A Comprehensive Toolkit for Effective Knowledge Distillation of Large Language Models
Chengyu Wang
Junbing Yan
Wenrui Cai
Yuanhao Yue
Jun Huang
VLM
45
0
0
27 May 2025
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding
Fuwen Luo
Shengfeng Lou
C. L. Philip Chen
Ziyue Wang
Chenliang Li
...
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Liu
AI4TS
LRM
81
0
0
27 May 2025
Multi-objective Large Language Model Alignment with Hierarchical Experts
Zhuo Li
Guodong DU
Weiyang Guo
Yigeng Zhou
Xiucheng Li
...
Fangming Liu
Yequan Wang
Deheng Ye
Min Zhang
Jing Li
ALM
MoE
70
0
0
27 May 2025
Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning
Shijie Liu
Andrew C. Cullen
Paul Montague
S. Erfani
Benjamin I. P. Rubinstein
OffRL
AAML
42
1
0
27 May 2025
Can Large Reasoning Models Self-Train?
Sheikh Shafayat
Fahim Tajwar
Ruslan Salakhutdinov
J. Schneider
Andrea Zanette
ReLM
OffRL
LRM
76
2
0
27 May 2025
Gait-Conditioned Reinforcement Learning with Multi-Phase Curriculum for Humanoid Locomotion
Tianhu Peng
Lingfan Bao
Chengxu Zhou
39
0
0
27 May 2025
Reinforcing General Reasoning without Verifiers
Xiangxin Zhou
Zichen Liu
Anya Sims
Haonan Wang
Tianyu Pang
Chongxuan Li
Liang Wang
Min Lin
C. Du
OffRL
LRM
78
2
0
27 May 2025
TAT-R1: Terminology-Aware Translation with Reinforcement Learning and Word Alignment
Zheng Li
Mao Zheng
Mingyang Song
Wenjie Yang
37
0
0
27 May 2025
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning
Yongchao Chen
Y. Liu
Junwei Zhou
Yilun Hao
Jingquan Wang
Yang Zhang
Chuchu Fan
OffRL
ReLM
AI4TS
SyDa
ALM
LRM
64
0
0
27 May 2025
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An
Ruochen Wang
Tianyi Zhou
Cho-Jui Hsieh
KELM
LRM
94
1
0
27 May 2025
Convergent Functions, Divergent Forms
Hyeonseong Jeon
Ainaz Eftekhar
Aaron Walsman
Kuo-Hao Zeng
Ali Farhadi
Ranjay Krishna
20
0
0
27 May 2025
Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners
Jiabao Ji
Yongchao Chen
Yang Zhang
Ramana Rao Kompella
Chuchu Fan
Gaowen Liu
Shiyu Chang
113
0
0
26 May 2025
Fine-grained List-wise Alignment for Generative Medication Recommendation
Chenxiao Fan
Chongming Gao
Wentao Shi
Yaxin Gong
Zihao Zhao
Fuli Feng
LM&MA
60
0
0
26 May 2025
Continuous Self-Improvement of Large Language Models by Test-time Training with Verifier-Driven Sample Selection
Mohammad Mahdi Moradi
Hossam Amer
Sudhir Mudur
Weiwei Zhang
Yang Liu
Walid Ahmed
VLM
LRM
29
0
0
26 May 2025
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Jiangjie Chen
Qianyu He
Siyu Yuan
Aili Chen
Zhicheng Cai
...
Qiying Yu
Xuefeng Li
Jiaze Chen
Hao Zhou
Mingxuan Wang
ReLM
LRM
94
2
0
26 May 2025
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Minheng Ni
Zhengyuan Yang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
W. Zuo
Lijuan Wang
ReLM
LRM
85
1
0
26 May 2025
Token-Importance Guided Direct Preference Optimization
Yang Ning
Lin Hai
Liu Yibo
Tian Baoliang
Liu Guoqing
Zhang Haijun
71
0
0
26 May 2025
MultiPhishGuard: An LLM-based Multi-Agent System for Phishing Email Detection
Yinuo Xue
Eric Spero
Yun Sing Koh
Giovanni Russello
AAML
26
1
0
26 May 2025
Interleaved Reasoning for Large Language Models via Reinforcement Learning
Roy Xie
David Qiu
Deepak Gopinath
Dong Lin
Yanchao Sun
Chong-Jun Wang
Saloni Potdar
Bhuwan Dhingra
KELM
LRM
73
0
0
26 May 2025
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes
Xiao Chen
Tai Wang
Quanyi Li
Tao Huang
Jiangmiao Pang
Tianfan Xue
48
0
0
26 May 2025
Learning to Reason without External Rewards
Xuandong Zhao
Zhewei Kang
Aosong Feng
Sergey Levine
Dawn Song
OffRL
ReLM
LRM
120
8
0
26 May 2025
Previous
1
2
3
...
6
7
8
...
171
172
173
Next