ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms
v1v2 (latest)

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 8,599 papers shown
Title
Self-Predictive Dynamics for Generalization of Vision-based Reinforcement Learning
Self-Predictive Dynamics for Generalization of Vision-based Reinforcement Learning
Kyungsoo Kim
Jeongsoo Ha
Yusung Kim
BDL
42
7
0
05 Jun 2025
Whole-Body Constrained Learning for Legged Locomotion via Hierarchical Optimization
Haoyu Wang
Ruyi Zhou
L. Ding
Tie Liu
Zhelin Zhang
P. Xu
Haibo Gao
Z. Deng
97
0
0
05 Jun 2025
Enhancing Efficiency and Propulsion in Bio-mimetic Robotic Fish through End-to-End Deep Reinforcement Learning
Xinyu Cui
Boai Sun
Yi Zhu
Ning Yang
Haifeng Zhang
Weicheng Cui
D. Fan
Jun Wang
159
9
0
05 Jun 2025
Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning
Yunsheng Tian
Joshua Jacob
Yijiang Huang
Jialiang Zhao
Edward Gu
...
Branden Romero
Sachin Chitta
Shinjiro Sueda
Hui Li
Wojciech Matusik
92
0
0
05 Jun 2025
TreeRPO: Tree Relative Policy Optimization
Zhicheng YANG
Zhijiang Guo
Yinya Huang
Xiaodan Liang
Yiwei Wang
Jing Tang
LRM
82
0
0
05 Jun 2025
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
Yuru Jiang
Wenxuan Ding
Shangbin Feng
Greg Durrett
Yulia Tsvetkov
88
0
0
05 Jun 2025
Composing Agents to Minimize Worst-case Risk
Guruprerana Shabadi
Rajeev Alur
73
0
0
05 Jun 2025
Amortized variational transdimensional inference
Laurence Davies
Dan Mackinlay
Rafael Oliveira
Scott A. Sisson
DRLBDL
111
0
0
05 Jun 2025
On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models
Xingwu Chen
Tianle Li
Difan Zou
LRM
99
0
0
05 Jun 2025
RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation
Tianjiao Li
Mengran Yu
Chenyu Shi
Yanjun Zhao
Xiaojing Liu
Qiang Zhang
Qi Zhang
Xuanjing Huang
Jiayin Wang
93
0
0
05 Jun 2025
PulseRide: A Robotic Wheelchair for Personalized Exertion Control with Human-in-the-Loop Reinforcement Learning
Azizul Zahid
Bibek Poudel
Danny Scott
Jason Scott
Scott Crouter
Weizi Li
Sai Swaminathan
107
0
0
05 Jun 2025
Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models
Fei Ding
Baiqiao Wang
Zijian Zeng
Youwei Wang
LRM
89
0
0
05 Jun 2025
ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning
ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning
Zhao Jin
Zhengping Che
Zhen Zhao
Kun Wu
Yuheng Zhang
...
Qiang Zhang
Xiaozhu Ju
Jing Tian
Yousong Xue
Jian Tang
VGen
161
0
0
05 Jun 2025
Customizing Speech Recognition Model with Large Language Model Feedback
Customizing Speech Recognition Model with Large Language Model Feedback
Shaoshi Ling
Guoli Ye
17
0
0
05 Jun 2025
ProRefine: Inference-time Prompt Refinement with Textual Feedback
Deepak Pandita
Tharindu Cyril Weerasooriya
A. Shah
Christopher Homan
Wei Wei
LLMAGReLMLRM
145
0
0
05 Jun 2025
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay
Yifan Sun
Jingyan Shen
Yibin Wang
Tianyu Chen
Zhendong Wang
Mingyuan Zhou
Huan Zhang
80
0
0
05 Jun 2025
Realizing Text-Driven Motion Generation on NAO Robot: A Reinforcement Learning-Optimized Control Pipeline
Zihan Xu
Mengxian Hu
Kaiyan Xiao
Qin Fang
Chengju Liu
Qijun Chen
81
0
0
05 Jun 2025
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
Anirudh Bharadwaj
Chaitanya Malaviya
Nitish Joshi
Mark Yatskar
123
0
0
05 Jun 2025
Dissecting Long Reasoning Models: An Empirical Study
Yongyu Mu
Jiali Zeng
Bei Li
Xinyan Guan
Fandong Meng
Jie Zhou
Tong Xiao
Jingbo Zhu
OffRLLRM
100
0
0
05 Jun 2025
When Maximum Entropy Misleads Policy Optimization
When Maximum Entropy Misleads Policy Optimization
Ruipeng Zhang
Ya-Chien Chang
Sicun Gao
29
0
0
05 Jun 2025
Collaborative Learning in Agentic Systems: A Collective AI is Greater Than the Sum of Its Parts
Collaborative Learning in Agentic Systems: A Collective AI is Greater Than the Sum of Its Parts
Saptarshi Nath
Christos Peridis
Eseoghene Benjamin
Xinran Liu
Soheil Kolouri
Peter Kinnell
Zexin Li
Cong Liu
Shirin Dora
Andrea Soltoggio
25
0
0
05 Jun 2025
Robustness Evaluation for Video Models with Reinforcement Learning
Robustness Evaluation for Video Models with Reinforcement Learning
Ashwin Ramesh Babu
Sajad Mousavi
Vineet Gundecha
Sahand Ghorbanpour
Avisek Naug
Antonio Guillen
Ricardo Luna Gutierrez
Soumyendu Sarkar
AAML
14
0
0
05 Jun 2025
RewardAnything: Generalizable Principle-Following Reward Models
RewardAnything: Generalizable Principle-Following Reward Models
Zhuohao Yu
Jiali Zeng
Weizheng Gu
Yidong Wang
Jindong Wang
Fandong Meng
Jie Zhou
Yue Zhang
Shikun Zhang
Wei Ye
LRM
97
1
0
04 Jun 2025
Unsupervised Meta-Testing with Conditional Neural Processes for Hybrid Meta-Reinforcement Learning
S. E. Ada
Emre Ugur
BDL
48
1
0
04 Jun 2025
Multimodal Tabular Reasoning with Privileged Structured Information
Multimodal Tabular Reasoning with Privileged Structured Information
Jun-Peng Jiang
Yu Xia
Hai-Long Sun
Shiyin Lu
Qing-Guo Chen
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
LMTDLRM
89
0
0
04 Jun 2025
Aligning Large Language Models with Implicit Preferences from User-Generated Content
Zhaoxuan Tan
Zheng Li
Tianyi Liu
Haodong Wang
Hyokun Yun
...
Yifan Gao
Ruijie Wang
Priyanka Nigam
Bing Yin
Meng Jiang
67
0
0
04 Jun 2025
SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL
SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL
Jiaheng Hu
Peter Stone
Roberto Martín-Martín
102
0
0
04 Jun 2025
Robust Preference Optimization via Dynamic Target Margins
Robust Preference Optimization via Dynamic Target Margins
Jie Sun
Junkang Wu
Jiancan Wu
Zhibo Zhu
Xingyu Lu
Jun Zhou
Lintao Ma
Xiang Wang
46
0
0
04 Jun 2025
LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward
LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward
Yi Zhao
Siqi Wang
Jing Li
49
0
0
04 Jun 2025
Multi-objective Aligned Bidword Generation Model for E-commerce Search Advertising
Multi-objective Aligned Bidword Generation Model for E-commerce Search Advertising
Zhenhui Liu
Chunyuan Yuan
Ming Pang
Zheng Fang
Li Yuan
Xue Jiang
Changping Peng
Zhangang Lin
Zheng Luo
Jingping Shao
69
0
0
04 Jun 2025
Autonomous Vehicle Lateral Control Using Deep Reinforcement Learning with MPC-PID Demonstration
Autonomous Vehicle Lateral Control Using Deep Reinforcement Learning with MPC-PID Demonstration
Chengdong Wu
Sven Kirchner
Nils Purschke
Alois Knoll
60
0
0
04 Jun 2025
Leveraging Reward Models for Guiding Code Review Comment Generation
Oussama Ben Sghaier
Rosalia Tufano
Gabriele Bavota
Houari Sahraoui
17
0
0
04 Jun 2025
Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark
Ziming Cheng
Binrui Xu
Lisheng Gong
Zuhe Song
Tianshuo Zhou
...
Wei Chen
Zhiyuan Huang
Mingjie Zhan
Xiaojie Wang
Fangxiang Feng
VLMLRM
46
1
0
04 Jun 2025
Interpretability by Design for Efficient Multi-Objective Reinforcement Learning
Interpretability by Design for Efficient Multi-Objective Reinforcement Learning
Qiyue Xia
J. Michael Herrmann
53
0
0
04 Jun 2025
Misalignment or misuse? The AGI alignment tradeoff
Misalignment or misuse? The AGI alignment tradeoff
Max Hellrigel-Holderbaum
Leonard Dung
66
0
0
04 Jun 2025
SAGE:Specification-Aware Grammar Extraction for Automated Test Case Generation with LLMs
SAGE:Specification-Aware Grammar Extraction for Automated Test Case Generation with LLMs
Aditi
Hyunwoo Park
Sicheol Sung
Yo-Sub Han
Sang-Ki Ko
10
0
0
04 Jun 2025
CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design
CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design
Yifeng Xiao
Yurong Xu
Ning Yan
Masood S. Mortazavi
Pierluigi Nuzzo
107
0
0
04 Jun 2025
Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving
Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving
Li Zeqiao
Wang Yijing
Wang Haoyu
Li Zheng
Li Peng
Zuo zhiqiang
Hu Chuan
105
0
0
04 Jun 2025
PPO in the Fisher-Rao geometry
PPO in the Fisher-Rao geometry
Razvan-Andrei Lascu
David Siska
Łukasz Szpruch
39
0
0
04 Jun 2025
R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning
R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning
Qingfei Zhao
Ruobing Wang
Dingling Xu
Daren Zha
Limin Liu
AI4TSKELMLRM
70
0
0
04 Jun 2025
Enhancing Decision-Making of Large Language Models via Actor-Critic
Enhancing Decision-Making of Large Language Models via Actor-Critic
Heng Dong
Kefei Duan
Chongjie Zhang
LLMAG
22
0
0
04 Jun 2025
Self-Composing Policies for Scalable Continual Reinforcement Learning
Self-Composing Policies for Scalable Continual Reinforcement Learning
Mikel Malagón
Josu Ceberio
Jose A. Lozano
CLL
22
5
0
04 Jun 2025
High-speed control and navigation for quadrupedal robots on complex and discrete terrain
High-speed control and navigation for quadrupedal robots on complex and discrete terrain
Hyeongjun Kim
H. Oh
Jeongsoo Park
Yunho Kim
D. Youm
Moonkyu Jung
Minho Lee
Jemin Hwangbo
56
0
0
03 Jun 2025
CORA: Coalitional Rational Advantage Decomposition for Multi-Agent Policy Gradients
CORA: Coalitional Rational Advantage Decomposition for Multi-Agent Policy Gradients
Mengda Ji
Genjiu Xu
Liying Wang
22
0
0
03 Jun 2025
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
Zhengze Zhang
Shiqi Wang
Yiqun Shen
Simin Guo
Dahua Lin
Xiaoliang Wang
Nguyen Cam-Tu
Fei Tan
7
0
0
03 Jun 2025
KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG
KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG
Yongjian Li
HaoCheng Chu
Yukun Yan
Zhenghao Liu
S. Yu
Zheni Zeng
Ruobing Wang
Sen Song
Zhiyuan Liu
Maosong Sun
37
0
0
03 Jun 2025
DPO Learning with LLMs-Judge Signal for Computer Use Agents
Man Luo
David Cobbley
Xin Su
Shachar Rosenman
Vasudev Lal
Shao-Yen Tseng
Phillip Howard
42
0
0
03 Jun 2025
AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation
AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation
Prashanth Vijayaraghavan
Luyao Shi
Ehsan Degan
Vandana Mukherjee
Xin Zhang
68
0
0
03 Jun 2025
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
Andre He
Daniel Fried
Sean Welleck
56
0
0
03 Jun 2025
NetPress: Dynamically Generated LLM Benchmarks for Network Applications
NetPress: Dynamically Generated LLM Benchmarks for Network Applications
Yajie Zhou
Jiajun Ruan
Eric S. Wang
Sadjad Fouladi
Francis Y. Yan
Kevin Hsieh
Zaoxing Liu
27
0
0
03 Jun 2025
Previous
12345...170171172
Next