ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXivPDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 6,730 papers shown
Title
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
Haoyuan Wu
Xueyi Chen
Rui Ming
Jilong Gao
Shoubo Hu
Zhuolun He
Bei Yu
LRM
19
0
0
19 May 2025
Dynamic Sight Range Selection in Multi-Agent Reinforcement Learning
Dynamic Sight Range Selection in Multi-Agent Reinforcement Learning
Wei-Chen Liao
Ti-Rong Wu
I-Chen Wu
12
0
0
19 May 2025
Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
Sunghwan Kim
Dongjin Kang
Taeyoon Kwon
Hyungjoo Chae
Dongha Lee
Jinyoung Yeo
ALM
2
0
0
19 May 2025
Action-Dependent Optimality-Preserving Reward Shaping
Action-Dependent Optimality-Preserving Reward Shaping
Grant C. Forbes
Jianxun Wang
Leonardo Villalobos-Arias
Arnav Jhala
David L. Roberts
OffRL
12
0
0
19 May 2025
LiBOG: Lifelong Learning for Black-Box Optimizer Generation
LiBOG: Lifelong Learning for Black-Box Optimizer Generation
Jiyuan Pei
Yi Mei
Jialin Liu
Mengjie Zhang
7
0
0
19 May 2025
Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs
Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs
Zhuo Yang
Lingli Ge
Dong Han
Tianfan Fu
Yuqiang Li
22
0
0
19 May 2025
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
Haoyuan Wu
Rui Ming
Jilong Gao
Hangyu Zhao
Xueyi Chen
Yikai Yang
Haisheng Zheng
Zhuolun He
Bei Yu
13
0
0
19 May 2025
HIL: Hybrid Imitation Learning of Diverse Parkour Skills from Videos
HIL: Hybrid Imitation Learning of Diverse Parkour Skills from Videos
Jiadong Wang
Yifeng Jiang
Haotian Zhang
Chen Tessler
Davis Rempe
Jessica Hodgins
Xue Bin Peng
7
0
0
19 May 2025
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
Haiquan Wen
Yiwei He
Zhenglin Huang
Tianxiao Li
Zihan YU
Xingru Huang
Lu Qi
Baoyuan Wu
Xuelong Li
Guangliang Cheng
VGen
9
0
0
19 May 2025
Multi-parameter Control for the (1+($λ$,$λ$))-GA on OneMax via Deep Reinforcement Learning
Multi-parameter Control for the (1+(λλλ,λλλ))-GA on OneMax via Deep Reinforcement Learning
Tai Nguyen
Phong Le
Carola Doerr
Nguyen Dang
9
0
0
19 May 2025
Power Allocation for Delay Optimization in Device-to-Device Networks: A Graph Reinforcement Learning Approach
Power Allocation for Delay Optimization in Device-to-Device Networks: A Graph Reinforcement Learning Approach
Hao Fang
Kai Huang
Hao Ye
Chongtao Guo
Le Liang
Xiao Li
Shi Jin
9
0
0
19 May 2025
Dribble Master: Learning Agile Humanoid Dribbling Through Legged Locomotion
Dribble Master: Learning Agile Humanoid Dribbling Through Legged Locomotion
Zhuoheng Wang
Jinyin Zhou
Qi Wu
9
0
0
19 May 2025
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li
Ming Lin
Tomer Galanti
Zhengzhong Tu
Tianbao Yang
9
0
0
18 May 2025
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
Minghan Chen
Guikun Chen
Wenguan Wang
Yi Yang
12
0
0
18 May 2025
Design of a 3-DOF Hopping Robot with an Optimized Gearbox: An Intermediate Platform Toward Bipedal Robots
Design of a 3-DOF Hopping Robot with an Optimized Gearbox: An Intermediate Platform Toward Bipedal Robots
Jonghun Choe
Gijeong Kim
Hajun Kim
Dongyun Kang
Min-Su Kim
Hae-Won Park
2
0
0
18 May 2025
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Yang Zhao
Kai Xiong
Xiao Ding
Li Du
YangouOuyang
...
Feiyu Xiong
Bin Liu
Dong Hu
Bing Qin
Ting Liu
OffRL
4
0
0
18 May 2025
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Zirun Guo
Minjie Hong
Tao Jin
OffRL
LRM
9
0
0
18 May 2025
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning
Longxi Gao
Li Zhang
Mengwei Xu
2
0
0
18 May 2025
Enriching Patent Claim Generation with European Patent Dataset
Enriching Patent Claim Generation with European Patent Dataset
Lekang Jiang
Chengzu Li
Stephan Goetz
7
0
0
18 May 2025
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
Georgiy Malaniya
Anton Bolychev
Grigory Yaremenko
Anastasia Krasnaya
Pavel Osinenko
2
0
0
18 May 2025
A universal policy wrapper with guarantees
A universal policy wrapper with guarantees
Anton Bolychev
Georgiy Malaniya
Grigory Yaremenko
Anastasia Krasnaya
Pavel Osinenko
OffRL
7
0
0
18 May 2025
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Xinbin Yuan
Jian Zhang
K. Li
Zhuoxuan Cai
Lujian Yao
...
Enguang Wang
Qibin Hou
Jinwei Chen
Peng-Tao Jiang
Bo Li
4
0
0
18 May 2025
Table-R1: Region-based Reinforcement Learning for Table Understanding
Table-R1: Region-based Reinforcement Learning for Table Understanding
Zhenhe Wu
Jian Yang
Jiaheng Liu
Xianjie Wu
Changzai Pan
Jie Zhang
Yu Zhao
Shuangyong Song
Yongxiang Li
Zhoujun Li
LMTD
LRM
2
0
0
18 May 2025
Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward
Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward
Han Weng
Boyi Liu
Yuanfeng Song
Dun Zeng
Yingxiang Yang
Yi Zhan
Longjie Cui
Xiaoming Yin
Yang Sun
4
0
0
18 May 2025
SAINT: Attention-Based Modeling of Sub-Action Dependencies in Multi-Action Policies
SAINT: Attention-Based Modeling of Sub-Action Dependencies in Multi-Action Policies
Matthew Landers
Taylor W. Killian
Thomas Hartvigsen
Afsaneh Doryab
2
0
0
17 May 2025
Growable and Interpretable Neural Control with Online Continual Learning for Autonomous Lifelong Locomotion Learning Machines
Growable and Interpretable Neural Control with Online Continual Learning for Autonomous Lifelong Locomotion Learning Machines
Arthicha Srisuchinnawong
Poramate Manoonpong
CLL
LRM
2
0
0
17 May 2025
VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
Yiting Wang
Guoheng Sun
Wanghao Ye
Gang Qu
Ang Li
OffRL
3DV
LRM
VLM
7
0
0
17 May 2025
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Siliang Zeng
Quan Wei
William Brown
Oana Frunza
Yuriy Nevmyvaka
Mingyi Hong
LRM
7
0
0
17 May 2025
PROBE: Proprioceptive Obstacle Detection and Estimation while Navigating in Clutter
PROBE: Proprioceptive Obstacle Detection and Estimation while Navigating in Clutter
Dhruv Metha Ramesh
Aravind Sivaramakrishnan
Shreesh Keskar
Kostas E. Bekris
Jingjin Yu
Abdeslam Boularias
2
0
0
17 May 2025
JULI: Jailbreak Large Language Models by Self-Introspection
JULI: Jailbreak Large Language Models by Self-Introspection
Jesson Wang
Zhanhao Hu
David Wagner
4
0
0
17 May 2025
Integrating Model-based Control and RL for Sim2Real Transfer of Tight Insertion Policies
Integrating Model-based Control and RL for Sim2Real Transfer of Tight Insertion Policies
Isidoros Marougkas
Dhruv Metha Ramesh
Joe H. Doerr
Edgar Granados
Aravind Sivaramakrishnan
Abdeslam Boularias
Kostas E. Bekris
OffRL
2
0
0
17 May 2025
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLM
LRM
25
0
0
17 May 2025
CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended World
CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended World
Zoya Volovikova
G. Gorbov
Petr Kuderov
Aleksandr I. Panov
A. Skrynnik
4
0
0
17 May 2025
CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction
CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction
Jing Zou
Qingqiu Li
Chenyu Lian
Lihao Liu
Xiaohan Yan
Shujun Wang
Jing Qin
VLM
2
0
0
17 May 2025
Master Rules from Chaos: Learning to Reason, Plan, and Interact from Chaos for Tangram Assembly
Master Rules from Chaos: Learning to Reason, Plan, and Interact from Chaos for Tangram Assembly
Chao Zhao
Chunli Jiang
Lifan Luo
Guanlan Zhang
Hongyu Yu
Michael Yu Wang
Qifeng Chen
LRM
2
0
0
17 May 2025
Bench-NPIN: Benchmarking Non-prehensile Interactive Navigation
Bench-NPIN: Benchmarking Non-prehensile Interactive Navigation
Ninghan Zhong
Steven Caro
Avraiem Iskandar
Megnath Ramesh
Stephen L. Smith
2
0
0
17 May 2025
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
Yaorui Shi
Shihan Li
Chang Wu
Zhiyuan Liu
Junfeng Fang
Hengxing Cai
An Zhang
Xinbing Wang
ReLM
LRM
36
0
0
16 May 2025
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
Kehan Long
Jorge Cortés
Nikolay Atanasov
9
0
0
16 May 2025
Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search
Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search
Rui Liu
Rui Xie
Zijun Yao
Yanjie Fu
Dongjie Wang
2
0
0
16 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
9
0
0
16 May 2025
Deep Symbolic Optimization: Reinforcement Learning for Symbolic Mathematics
Deep Symbolic Optimization: Reinforcement Learning for Symbolic Mathematics
Conor F. Hayes
Felipe Leno Da Silva
Jiachen Yang
T. Nathan Mundhenk
Chak Shing Lee
...
Ahmet Can Solak
Thomas Desautels
Daniel Faissol
Brenden K. Petersen
Mikel Landajuela
22
0
0
16 May 2025
Exploration by Random Distribution Distillation
Exploration by Random Distribution Distillation
Zhirui Fang
Kai Yang
Jian Tao
Jiafei Lyu
Lusong Li
Li Shen
Xiu Li
12
0
0
16 May 2025
Improving Assembly Code Performance with Large Language Models via Reinforcement Learning
Improving Assembly Code Performance with Large Language Models via Reinforcement Learning
Anjiang Wei
Tarun Suresh
Huanmi Tan
Yinglun Xu
Gagandeep Singh
Alex Aiken
Alex Aiken
7
0
0
16 May 2025
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee
Lifan Yuan
Dilek Hakkani-Tur
Hao Peng
7
0
0
16 May 2025
Unifying Segment Anything in Microscopy with Multimodal Large Language Model
Unifying Segment Anything in Microscopy with Multimodal Large Language Model
Manyu Li
Ruian He
Zixian Zhang
Weimin Tan
Bo Yan
VLM
12
0
0
16 May 2025
Meta-World+: An Improved, Standardized, RL Benchmark
Meta-World+: An Improved, Standardized, RL Benchmark
Reginald McLean
Evangelos Chatzaroulas
Luc McCutcheon
Frank Röder
Tianhe Yu
...
Ryan Julian
Jordan Terry
Isaac Woungang
Nariman Farsad
Pablo Samuel Castro
OffRL
14
0
0
16 May 2025
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
22
0
0
16 May 2025
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Francesco Sovrano
19
0
0
16 May 2025
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Zihan Wang
Jiaqi Zeng
Olivier Delalleau
Hoo-Chang Shin
Felipe Soares
Alexander Bukharin
Ellie Evans
Yi Dong
Oleksii Kuchaiev
22
0
0
16 May 2025
Reinforcement Learning for AMR Charging Decisions: The Impact of Reward and Action Space Design
Reinforcement Learning for AMR Charging Decisions: The Impact of Reward and Action Space Design
Janik Bischoff
Alexandru Rinciog
Anne Meyer
OffRL
14
0
0
16 May 2025
1234...133134135
Next