Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.06347
Cited By
v1
v2 (latest)
Proximal Policy Optimization Algorithms
20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Proximal Policy Optimization Algorithms"
50 / 8,601 papers shown
Title
Simple, Good, Fast: Self-Supervised World Models Free of Baggage
Jan Robine
Marc Höftmann
Stefan Harmeling
DRL
OCL
63
1
0
03 Jun 2025
Understanding the Impact of Sampling Quality in Direct Preference Optimization
Kyung Rok Kim
Yumo Bai
Chonghuan Wang
Guanting Chen
20
0
0
03 Jun 2025
FAuNO: Semi-Asynchronous Federated Reinforcement Learning Framework for Task Offloading in Edge Systems
Frederico Metelo
Alexandre Oliveira
Stevo Racković
Pedro Ákos Costa
Cláudia Soares
OffRL
FedML
56
0
0
03 Jun 2025
Tactile MNIST: Benchmarking Active Tactile Perception
Tim Schneider
Guillaume Duret
Cristiana de Farias
Roberto Calandra
Liming Chen
Jan Peters
29
0
0
03 Jun 2025
DPO Learning with LLMs-Judge Signal for Computer Use Agents
Man Luo
David Cobbley
Xin Su
Shachar Rosenman
Vasudev Lal
Shao-Yen Tseng
Phillip Howard
44
0
0
03 Jun 2025
KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG
Yongjian Li
HaoCheng Chu
Yukun Yan
Zhenghao Liu
S. Yu
Zheni Zeng
Ruobing Wang
Sen Song
Zhiyuan Liu
Maosong Sun
42
0
0
03 Jun 2025
EgoVLM: Policy Optimization for Egocentric Video Understanding
Ashwin Vinod
Shrey Pandit
Aditya Vavre
Linshen Liu
LRM
41
0
0
03 Jun 2025
Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning
Yin Fang
Qiao Jin
Guangzhi Xiong
Bowen Jin
Xianrui Zhong
Siru Ouyang
Aidong Zhang
Jiawei Han
Zhiyong Lu
ReLM
OffRL
LRM
36
0
0
03 Jun 2025
High-speed control and navigation for quadrupedal robots on complex and discrete terrain
Hyeongjun Kim
H. Oh
Jeongsoo Park
Yunho Kim
D. Youm
Moonkyu Jung
Minho Lee
Jemin Hwangbo
58
0
0
03 Jun 2025
AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation
Prashanth Vijayaraghavan
Luyao Shi
Ehsan Degan
Vandana Mukherjee
Xin Zhang
68
0
0
03 Jun 2025
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
Zhengze Zhang
Shiqi Wang
Yiqun Shen
Simin Guo
Dahua Lin
Xiaoliang Wang
Nguyen Cam-Tu
Fei Tan
12
0
0
03 Jun 2025
Provable Reinforcement Learning from Human Feedback with an Unknown Link Function
Qining Zhang
Lei Ying
62
0
0
03 Jun 2025
NetPress: Dynamically Generated LLM Benchmarks for Network Applications
Yajie Zhou
Jiajun Ruan
Eric S. Wang
Sadjad Fouladi
Francis Y. Yan
Kevin Hsieh
Zaoxing Liu
32
0
0
03 Jun 2025
Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences
Yunhong Lu
Qichao Wang
H. Cao
Xiaoyin Xu
Min Zhang
47
0
0
03 Jun 2025
ADEPT: Adaptive Diffusion Environment for Policy Transfer Sim-to-Real
Youwei Yu
Junhong Xu
Lantao Liu
89
0
0
02 Jun 2025
VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
Desen Meng
Rui Huang
Zhilin Dai
Xinhao Li
Yifan Xu
...
Z. Huang
Meng Zhang
L. Zhang
Yi Liu
Limin Wang
OffRL
VLM
LRM
56
0
0
02 Jun 2025
Reinforcement Learning with Data Bootstrapping for Dynamic Subgoal Pursuit in Humanoid Robot Navigation
Chengyang Peng
Zhihao Zhang
Shiting Gong
Sankalp Agrawal
Keith A. Redmill
Ayonga Hereid
19
0
0
02 Jun 2025
Self-Challenging Language Model Agents
Yifei Zhou
Sergey Levine
Jason Weston
Xian Li
Sainbayar Sukhbaatar
ALM
ELM
51
0
0
02 Jun 2025
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
S. Wang
Le Yu
Chang Gao
Chujie Zheng
Shixuan Liu
...
Yang Yue
S. Song
Bowen Yu
Gao Huang
Junyang Lin
LRM
62
9
0
02 Jun 2025
Enhancing Diffusion-based Unrestricted Adversarial Attacks via Adversary Preferences Alignment
Kaixun Jiang
Zhaoyu Chen
Haijing Guo
Jinglun Li
Jiyuan Fu
Pinxue Guo
Hao Tang
Bo Li
Wenqiang Zhang
DiffM
AAML
77
0
0
02 Jun 2025
Bidirectional Soft Actor-Critic: Leveraging Forward and Reverse KL Divergence for Efficient Reinforcement Learning
Yixian Zhang
Huaze Tang
Changxu Wei
Wenbo Ding
50
0
0
02 Jun 2025
The Actor-Critic Update Order Matters for PPO in Federated Reinforcement Learning
Zhijie Xie
Shenghui Song
48
0
0
02 Jun 2025
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Zijian Wu
Jinjie Ni
Xiangyan Liu
Zichen Liu
Hang Yan
Michael Shieh
OffRL
ReLM
LRM
29
0
0
02 Jun 2025
Q-ARDNS-Multi: A Multi-Agent Quantum Reinforcement Learning Framework with Meta-Cognitive Adaptation for Complex 3D Environments
Umberto Gonçalves de Sousa
AI4CE
15
0
0
02 Jun 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Hyojin Bahng
Caroline Chan
F. Durand
Phillip Isola
EGVM
25
0
0
02 Jun 2025
React to Surprises: Stable-by-Design Neural Feedback Control and the Youla-REN
Nicholas H. Barbara
Ruigang Wang
Alexandre Megretski
I. Manchester
54
0
0
02 Jun 2025
Incentivizing LLMs to Self-Verify Their Answers
Fuxiang Zhang
Jiacheng Xu
Chaojie Wang
Ce Cui
Yang Liu
Bo An
ReLM
LRM
54
0
0
02 Jun 2025
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
Zhong Zhang
Yaxi Lu
Yikun Fu
Yupeng Huo
Shenzhi Yang
...
Chongyi Wang
Chi Chen
Yuan Yao
Zhiyuan Liu
Maosong Sun
LLMAG
ALM
61
0
0
02 Jun 2025
Towards Human-like Preference Profiling in Sequential Recommendation
Z. Ouyang
Qianlong Wen
Chunhui Zhang
Yanfang Ye
Soroush Vosoughi
HAI
22
0
0
02 Jun 2025
Improving LLM-Generated Code Quality with GRPO
Maxime Robeyns
Laurence Aitchison
ALM
14
0
0
02 Jun 2025
Robust and Safe Multi-Agent Reinforcement Learning Framework with Communication for Autonomous Vehicles
Keshawn Smith
Zhili Zhang
Hijaz Ahmad
Ehsan Sabouni
Maniak Mondal
Song Han
Wenchao Li
Fei Miao
19
0
0
01 Jun 2025
Reinforcement Learning with Random Time Horizons
Enric Ribera Borrell
Lorenz Richter
Christof Schütte
AI4TS
30
0
0
01 Jun 2025
Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges
Lajos Muzsai
David Imolai
András Lukács
LLMAG
LRM
11
0
0
01 Jun 2025
Action Dependency Graphs for Globally Optimal Coordinated Reinforcement Learning
Jianglin Ding
Jingcheng Tang
Gangshan Jing
27
0
0
01 Jun 2025
A Reinforcement Learning Approach for RIS-aided Fair Communications
Alex Pierron
Michel Barbeau
L. D. Cicco
José Rubio-Hernán
Joaquin Garcia-Alfaro
20
0
0
01 Jun 2025
Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning
Yizhuo Zhang
Heng Wang
Shangbin Feng
Zhaoxuan Tan
Xinyun Liu
Yulia Tsvetkov
OffRL
47
0
0
01 Jun 2025
Accelerated Learning with Linear Temporal Logic using Differentiable Simulation
Alper Kamil Bozkurt
Calin Belta
Ming C. Lin
43
0
0
01 Jun 2025
Doubly Robust Alignment for Large Language Models
Erhan Xu
Kai Ye
Hongyi Zhou
Luhan Zhu
Francesco Quinzan
Chengchun Shi
29
0
0
01 Jun 2025
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
Hongyao Tang
J. Obando-Ceron
Pablo Samuel Castro
Aaron Courville
Glen Berseth
33
0
0
31 May 2025
Central Path Proximal Policy Optimization
Nikola Milosevic
Johannes Müller
Nico Scherf
20
0
0
31 May 2025
Comparing Traditional and Reinforcement-Learning Methods for Energy Storage Control
Elinor Ginzburg
Itay Segev
Yoash Levron
Sarah Keren
OffRL
20
0
0
31 May 2025
BASIL: Best-Action Symbolic Interpretable Learning for Evolving Compact RL Policies
Kourosh Shahnazari
Seyed Moein Ayyoubzadeh
Mohammadali Keshtparvar
OffRL
35
0
0
31 May 2025
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
Y. Fu
Yuanheng Zhu
Jiajun Chai
Guojun Yin
Wei Lin
Qichao Zhang
Dongbin Zhao
20
0
0
31 May 2025
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Shelly Bensal
Umar Jamil
Christopher Bryant
M. Russak
Kiran Kamble
Dmytro Mozolevskyi
Muayad Ali
Waseem Alshikh
LLMAG
ReLM
LRM
23
0
0
30 May 2025
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
Yiqing Liang
Jielin Qiu
Wenhao Ding
Zuxin Liu
James Tompkin
Mengdi Xu
Mengzhou Xia
Zhengzhong Tu
Laixi Shi
Jiacheng Zhu
OffRL
123
0
0
30 May 2025
Proactive Guidance of Multi-Turn Conversation in Industrial Search
Xiaoyu Li
Xiao Li
Li Gao
Yiding Liu
Xiaoyang Wang
Shuaiqiang Wang
Junfeng Wang
Dawei Yin
LLMAG
22
0
0
30 May 2025
How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning
Hongyi Cai
Junlin Wang
Xiaoyin Chen
Bhuwan Dhingra
LRM
17
0
0
30 May 2025
Causal-aware Large Language Models: Enhancing Decision-Making Through Learning, Adapting and Acting
Wei Chen
Jiahao Zhang
Haipeng Zhu
Boyan Xu
Zijian Li
Keli Zhang
Junjian Ye
Ruichu Cai
29
1
0
30 May 2025
MOFGPT: Generative Design of Metal-Organic Frameworks using Language Models
Srivathsan Badrinarayanan
Rishikesh Magar
Akshay Antony
Radheesh Sharma Meda
Amir Barati Farimani
AI4CE
9
0
0
30 May 2025
Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards
Xun Lu
Yunyi Yang
Yongbo Gai
Kai Luo
Shihao Huang
Jianhe Lin
Xiaoxi Jiang
Guanjun Jiang
31
0
0
30 May 2025
Previous
1
2
3
4
5
6
...
171
172
173
Next