ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXivPDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 6,779 papers shown
Title
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Junshu Pan
Wei Shen
Shulin Huang
Qiji Zhou
Yue Zhang
74
0
0
22 Apr 2025
AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization
AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization
Jinda Lu
Jinghan Li
Yuan Gao
Junkang Wu
Jiancan Wu
Xuben Wang
Xiangnan He
168
0
0
22 Apr 2025
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
Wang Lin
Liyu Jia
Wentao Hu
Kaihang Pan
Zhongqi Yue
Wei Zhao
Jingyuan Chen
Fei Wu
Hanwang Zhang
VGen
46
1
0
22 Apr 2025
TTRL: Test-Time Reinforcement Learning
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Shang Qu
Li Sheng
Xuekai Zhu
Biqing Qi
Youbang Sun
Ganqu Cui
Ning Ding
Bowen Zhou
OffRL
162
2
0
22 Apr 2025
GraphEdge: Dynamic Graph Partition and Task Scheduling for GNNs Computing in Edge Network
GraphEdge: Dynamic Graph Partition and Task Scheduling for GNNs Computing in Edge Network
Wenjing Xiao
Chenglong Shi
Miaojiang Chen
Zhiquan Liu
Min Chen
Haiwen Song
34
0
0
22 Apr 2025
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL
Simone Papicchio
Simone Rossi
Luca Cagliero
Paolo Papotti
ReLM
LMTD
AI4TS
LRM
60
0
0
21 Apr 2025
Learning to Reason under Off-Policy Guidance
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
44
0
0
21 Apr 2025
Improving Human-AI Coordination through Adversarial Training and Generative Models
Improving Human-AI Coordination through Adversarial Training and Generative Models
Paresh Chaudhary
Yancheng Liang
Daphne Chen
S. Du
Natasha Jaques
69
0
0
21 Apr 2025
Efficient Pretraining Length Scaling
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
179
0
0
21 Apr 2025
DSPO: Direct Semantic Preference Optimization for Real-World Image Super-Resolution
DSPO: Direct Semantic Preference Optimization for Real-World Image Super-Resolution
Miaomiao Cai
Simiao Li
Wei Li
X. Y. Huang
Hanting Chen
Jie Hu
Yunhe Wang
32
0
0
21 Apr 2025
MARFT: Multi-Agent Reinforcement Fine-Tuning
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
Jun Wang
Wenbo Zhang
OffRL
31
0
0
21 Apr 2025
A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment
A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment
K. Huang
Han Wang
Yu-Juan Luo
Jingyu Chen
Jintao Chen
Xiangkui Zhang
Xiangyang Ji
Huaping Liu
33
0
0
21 Apr 2025
In-context Ranking Preference Optimization
In-context Ranking Preference Optimization
Junda Wu
Rohan Surana
Zhouhang Xie
Yiran Shen
Yu Xia
Tong Yu
Ryan Rossi
Prithviraj Ammanabrolu
Julian McAuley
40
0
0
21 Apr 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
Jianmin Wang
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
172
3
0
21 Apr 2025
DRAGON: Distributional Rewards Optimize Diffusion Generative Models
DRAGON: Distributional Rewards Optimize Diffusion Generative Models
Yatong Bai
Jonah Casebeer
Somayeh Sojoudi
Nicholas J. Bryan
DiffM
VLM
63
1
0
21 Apr 2025
Dynamic Legged Ball Manipulation on Rugged Terrains with Hierarchical Reinforcement Learning
Dynamic Legged Ball Manipulation on Rugged Terrains with Hierarchical Reinforcement Learning
Dongjie Zhu
Zhuo Yang
Tianhang Wu
Luzhou Ge
X. Li
Qi Liu
Xuzhao Li
29
0
0
21 Apr 2025
Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension
Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension
Lin Li
Wei Chen
Jiahui Li
Lu Chen
LRM
48
1
0
20 Apr 2025
Surrogate Fitness Metrics for Interpretable Reinforcement Learning
Surrogate Fitness Metrics for Interpretable Reinforcement Learning
Philipp Altmann
Céline Davignon
Maximilian Zorn
Fabian Ritz
Claudia Linnhoff-Popien
Thomas Gabor
29
0
0
20 Apr 2025
Deep Reinforcement Learning for Investor-Specific Portfolio Optimization: A Volatility-Guided Asset Selection Approach
Deep Reinforcement Learning for Investor-Specific Portfolio Optimization: A Volatility-Guided Asset Selection Approach
Arishi Orra
Aryan Bhambu
Himanshu Choudhary
Manoj Thakur
Selvaraju Natarajan
14
0
0
20 Apr 2025
Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction
Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction
Wenke Xia
Ruoxuan Feng
Dong Wang
Di Hu
32
0
0
20 Apr 2025
Exposing the Copycat Problem of Imitation-based Planner: A Novel Closed-Loop Simulator, Causal Benchmark and Joint IL-RL Baseline
Exposing the Copycat Problem of Imitation-based Planner: A Novel Closed-Loop Simulator, Causal Benchmark and Joint IL-RL Baseline
Hui Zhou
Shaoshuai Shi
Hongsheng Li
OffRL
32
0
0
20 Apr 2025
Direct Advantage Regression: Aligning LLMs with Online AI Reward
Direct Advantage Regression: Aligning LLMs with Online AI Reward
Li He
He Zhao
Stephen Wan
Dadong Wang
Lina Yao
Tongliang Liu
38
0
0
19 Apr 2025
HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation
HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation
Jiakai Tang
Jingsen Zhang
Zihang Tian
Xueyang Feng
Lei Wang
Xu Chen
OffRL
183
0
0
19 Apr 2025
Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Shouwei Ruan
Zhenyu Wu
Yao Huang
Ruochen Zhang
Yitong Sun
Caixin Kang
Xingxing Wei
EGVM
53
0
0
19 Apr 2025
Improving RL Exploration for LLM Reasoning through Retrospective Replay
Improving RL Exploration for LLM Reasoning through Retrospective Replay
Shihan Dou
Muling Wu
Jingwen Xu
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OffRL
LRM
32
0
0
19 Apr 2025
Quantum-Enhanced Reinforcement Learning for Power Grid Security Assessment
Quantum-Enhanced Reinforcement Learning for Power Grid Security Assessment
Benjamin M. Peter
Mert Korkali
29
0
0
19 Apr 2025
Optimal Lattice Boltzmann Closures through Multi-Agent Reinforcement Learning
Optimal Lattice Boltzmann Closures through Multi-Agent Reinforcement Learning
Paul Fischer
Sebastian Kaltenbach
Sergey Litvinov
Sauro Succi
Petros Koumoutsakos
AI4CE
31
0
0
19 Apr 2025
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
Jiyuan Shi
Xinzhe Liu
Dewei Wang
Ouyang Lu
Sören Schwertfeger
Fuchun Sun
Chenjia Bai
X. Li
47
0
0
19 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLM
LRM
61
13
0
18 Apr 2025
Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots
Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots
Zhengzhang Chen
Yan Xia
Jiayuan Liu
Jijia Liu
Wenhao Tang
...
Hongen Liao
Yu-Ping Wang
Chao Yu
Boyu Zhang
Fei Xing
24
0
0
18 Apr 2025
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Jiliang Ni
Jiachen Pu
Zhongyi Yang
Kun Zhou
Hui Wang
Xiaoliang Xiao
Dakui Wang
Xin Li
Jingfeng Luo
Conggang Hu
39
0
0
18 Apr 2025
Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning
Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning
R. P. Singh
M. Morisawa
M. Benallegue
Zhaoming Xie
F. Kanehiro
27
0
0
18 Apr 2025
Compile Scene Graphs with Reinforcement Learning
Compile Scene Graphs with Reinforcement Learning
Zuyao Chen
Jinlin Wu
Zhen Lei
Marc Pollefeys
Chang Wen Chen
OffRL
LRM
57
0
0
18 Apr 2025
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Yixuan Even Xu
Yash Savani
Fei Fang
Zico Kolter
OffRL
42
2
0
18 Apr 2025
Recursive Deep Inverse Reinforcement Learning
Recursive Deep Inverse Reinforcement Learning
Paul Ghanem
Michael Potter
Owen Howell
Pau Closas
A. Ramezani
Deniz Erdogmus
Tales Imbiriba
32
0
0
17 Apr 2025
Imperative MPC: An End-to-End Self-Supervised Learning with Differentiable MPC for UAV Attitude Control
Imperative MPC: An End-to-End Self-Supervised Learning with Differentiable MPC for UAV Attitude Control
Haonan He
Yuheng Qiu
Junyi Geng
86
0
0
17 Apr 2025
UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control
UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control
Yunsheng Wu
Korrawe Karunratanakul
Zhengyi Luo
Siyu Tang
DiffM
VGen
AI4CE
52
0
0
17 Apr 2025
Science-T2I: Addressing Scientific Illusions in Image Synthesis
Science-T2I: Addressing Scientific Illusions in Image Synthesis
Jialuo Li
Wenhao Chai
Xingyu Fu
Haiyang Xu
Saining Xie
MedIm
45
0
0
17 Apr 2025
TraCeS: Trajectory Based Credit Assignment From Sparse Safety Feedback
TraCeS: Trajectory Based Credit Assignment From Sparse Safety Feedback
Siow Meng Low
Akshat Kumar
48
0
0
17 Apr 2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu
Jinjie Ni
Zijian Wu
Chao Du
Longxu Dou
Haoran Wang
Tianyu Pang
Michael Shieh
OffRL
LRM
188
0
0
17 Apr 2025
Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
Xiaotian Zhang
Ruizhe Chen
Yang Feng
Zuozhu Liu
45
0
0
17 Apr 2025
Evolutionary Policy Optimization
Evolutionary Policy Optimization
Zelal Su "Lain" Mustafaoglu
Keshav Pingali
Risto Miikkulainen
31
0
0
17 Apr 2025
Energy-Based Reward Models for Robust Language Model Alignment
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab
Ruqi Zhang
179
0
0
17 Apr 2025
Aligning Constraint Generation with Design Intent in Parametric CAD
Aligning Constraint Generation with Design Intent in Parametric CAD
Evan Casey
Tianyu Zhang
Shu Ishida
John Roger Thompson
Amir Hosein Khasahmadi
Joseph George Lambourne
P. Jayaraman
K. Willis
38
0
0
17 Apr 2025
CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation
CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation
Elahe Khatibi
Ziyu Wang
Amir M. Rahmani
48
0
0
17 Apr 2025
Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
Tyler Ga Wei Lum
Olivia Y. Lee
C. Karen Liu
Jeannette Bohg
45
1
0
17 Apr 2025
pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild
pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild
Jonas Myhre Schiøtt
Viktor Sebastian Petersen
Dimitrios P. Papadopoulos
VLM
35
0
0
16 Apr 2025
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar
Ali Etemad
38
0
0
16 Apr 2025
A Graph-Based Reinforcement Learning Approach with Frontier Potential Based Reward for Safe Cluttered Environment Exploration
A Graph-Based Reinforcement Learning Approach with Frontier Potential Based Reward for Safe Cluttered Environment Exploration
Gabriele Calzolari
Vidya Sumathy
Christoforos Kanellakis
G. Nikolakopoulos
192
0
0
16 Apr 2025
R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors
R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors
Haoyang Wang
Liming Liu
Peiheng Wang
Junlin Hao
Jiangkai Wu
Xinggong Zhang
26
0
0
16 Apr 2025
Previous
123...567...134135136
Next