ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms
v1v2 (latest)

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 626 papers shown
Title
What Matters in Hierarchical Search for Combinatorial Reasoning Problems?
What Matters in Hierarchical Search for Combinatorial Reasoning Problems?
Michał Zawalski
Gracjan Góral
Michał Tyrolski
Emilia Wisnios
Franciszek Budrowski
Marek Cygan
Łukasz Kuciński
Piotr Miłoś
79
0
0
05 Jun 2024
Speeding up Policy Simulation in Supply Chain RL
Speeding up Policy Simulation in Supply Chain RL
Vivek Farias
Joren Gijsbrechts
Aryan I. Khojandi
Tianyi Peng
A. Zheng
99
0
0
04 Jun 2024
Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents
Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents
John L. Zhou
Weizhe Hong
Jonathan C. Kao
111
0
0
03 Jun 2024
Value Improved Actor Critic Algorithms
Value Improved Actor Critic Algorithms
Yaniv Oren
Moritz A. Zanger
Pascal R. van der Vaart
M. Spaan
Wendelin Bohmer
Wendelin Bohmer
OffRL
81
0
0
03 Jun 2024
Self-Improving Robust Preference Optimization
Self-Improving Robust Preference Optimization
Eugene Choi
Arash Ahmadian
Matthieu Geist
Oilvier Pietquin
M. G. Azar
90
9
0
03 Jun 2024
Unlocking Guidance for Discrete State-Space Diffusion and Flow Models
Unlocking Guidance for Discrete State-Space Diffusion and Flow Models
Hunter Nisonoff
Junhao Xiong
Stephan Allenspach
Jennifer Listgarten
153
44
0
03 Jun 2024
Do's and Don'ts: Learning Desirable Skills with Instruction Videos
Do's and Don'ts: Learning Desirable Skills with Instruction Videos
Hyunseung Kim
ByungKun Lee
Hojoon Lee
Dongyoon Hwang
Donghu Kim
Jaegul Choo
133
1
0
01 Jun 2024
Reward Machines for Deep RL in Noisy and Uncertain Environments
Reward Machines for Deep RL in Noisy and Uncertain Environments
Andrew C. Li
Zizhao Chen
Toryn Q. Klassen
Pashootan Vaezipoor
Rodrigo Toro Icarte
Sheila A. McIlraith
133
7
0
31 May 2024
Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
Linjiajie Fang
Ruoxue Liu
Jing Zhang
Wenjia Wang
Bing-Yi Jing
OffRL
133
7
0
31 May 2024
OR-Bench: An Over-Refusal Benchmark for Large Language Models
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
146
55
0
31 May 2024
HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios
HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios
Mingyang Jiang
Yueyuan Li
Songan Zhang
Siyuan Chen
Chunxiang Wang
Ming Yang
116
4
0
31 May 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
119
2
0
30 May 2024
RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
Mingqi Yuan
Roger Creus Castanyer
Bo Li
Xin Jin
Glen Berseth
Wenjun Zeng
162
0
0
29 May 2024
LNS2+RL: Combining Multi-Agent Reinforcement Learning with Large Neighborhood Search in Multi-Agent Path Finding
LNS2+RL: Combining Multi-Agent Reinforcement Learning with Large Neighborhood Search in Multi-Agent Path Finding
Yutong Wang
Tanishq Duhan
Jiaoyang Li
Guillaume Sartoretti
AI4CE
99
1
0
28 May 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
145
20
0
28 May 2024
Variational Offline Multi-agent Skill Discovery
Variational Offline Multi-agent Skill Discovery
Jiayu Chen
Bhargav Ganguly
Tian-Shing Lan
OffRL
102
3
0
26 May 2024
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
Kuo-Han Hung
Pang-Chi Lo
Jia-Fong Yeh
Han-Yuan Hsu
Yi-Ting Chen
Winston H. Hsu
133
0
0
26 May 2024
Model-free reinforcement learning with noisy actions for automated experimental control in optics
Model-free reinforcement learning with noisy actions for automated experimental control in optics
Lea Richtmann
Viktoria-S. Schmiesing
Dennis Wilken
Jan Heine
Aaron Tranter
Avishek Anand
Tobias J. Osborne
M. Heurs
85
2
0
24 May 2024
MeMo: Meaningful, Modular Controllers via Noise Injection
MeMo: Meaningful, Modular Controllers via Noise Injection
Megan Tjandrasuwita
Jie Xu
Armando Solar-Lezama
Wojciech Matusik
78
0
0
24 May 2024
Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization
Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization
Beitao Chen
Xinyu Lyu
Lianli Gao
Jingkuan Song
Hengtao Shen
MLLM
143
12
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
307
54
0
23 May 2024
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Ziqiao Ma
Zekun Wang
Joyce Chai
121
4
0
22 May 2024
Tackling Decision Processes with Non-Cumulative Objectives using Reinforcement Learning
Tackling Decision Processes with Non-Cumulative Objectives using Reinforcement Learning
Maximilian Nägele
Jan Olle
Thomas Fösel
Remmy Zen
Florian Marquardt
111
2
0
22 May 2024
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
N. Sebe
Mubarak Shah
EGVM
154
7
0
22 May 2024
Practical and efficient quantum circuit synthesis and transpiling with Reinforcement Learning
Practical and efficient quantum circuit synthesis and transpiling with Reinforcement Learning
David Kremer
Victor Villar
Hanhee Paik
Ivan Duran
Ismael Faro
Juan Cruz-Benito
87
19
0
21 May 2024
On Robust Reinforcement Learning with Lipschitz-Bounded Policy Networks
On Robust Reinforcement Learning with Lipschitz-Bounded Policy Networks
Nicholas H. Barbara
Ruigang Wang
I. Manchester
98
4
0
19 May 2024
Generalized Multi-Objective Reinforcement Learning with Envelope Updates in URLLC-enabled Vehicular Networks
Generalized Multi-Objective Reinforcement Learning with Envelope Updates in URLLC-enabled Vehicular Networks
Zijiang Yan
Hina Tabassum
65
3
0
18 May 2024
An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems
An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems
Jiyue Tao
Yunsong Zhang
Sunil Kumar Rajendran
Feitian Zhang
172
0
0
17 May 2024
I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning
I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning
Yashuai Yan
Esteve Valls Mascaro
Tobias Egle
Dongheui Lee
83
5
0
14 May 2024
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee
Jae Oh Woo
Juree Seok
Parisa Hassanzadeh
Wooseok Jang
...
Hankyu Moon
Wenjun Hu
Yeong-Dae Kwon
Taehee Lee
Seungjai Min
131
2
0
10 May 2024
CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics
CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics
David Valencia
Henry Williams
Trevor Gee
Bruce A MacDonaland
Minas V. Liarokapis
Minas Liarokapis
OffRL
142
2
0
04 May 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
125
72
0
29 Apr 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLMLRM
215
197
0
29 Apr 2024
Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey
Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey
Lingfan Bao
Josephine N. Humphreys
Tianhu Peng
Chengxu Zhou
97
9
0
25 Apr 2024
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
Amir Saeidi
Shivanshu Verma
Chitta Baral
Chitta Baral
ALM
96
26
0
23 Apr 2024
From Matching to Generation: A Survey on Generative Information Retrieval
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
183
59
0
23 Apr 2024
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
Wenyi Xiao
Ziwei Huang
Leilei Gan
Wanggui He
Haoyuan Li
Zhelun Yu
Hao Jiang
Leilei Gan
Linchao Zhu
MLLM
85
34
0
22 Apr 2024
MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering
MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering
Avinash Anand
Janak Kapuriya
Chhavi Kirtani
Apoorv Singh
Jay Saraf
Naman Lal
Jatin Kumar
A. Shivam
Astha Verma
R. Shah
OffRL
89
9
0
19 Apr 2024
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
Minbeom Kim
Hwanhee Lee
Joonsuk Park
Hwaran Lee
Kyomin Jung
96
3
0
18 Apr 2024
LTL-Constrained Policy Optimization with Cycle Experience Replay
LTL-Constrained Policy Optimization with Cycle Experience Replay
Ameesh Shah
Cameron Voloshin
Chenxi Yang
Abhinav Verma
Swarat Chaudhuri
Sanjit A. Seshia
133
1
0
17 Apr 2024
Self-playing Adversarial Language Game Enhances LLM Reasoning
Self-playing Adversarial Language Game Enhances LLM Reasoning
Pengyu Cheng
Tianhao Hu
Han Xu
Zhisong Zhang
Yong Dai
Lei Han
Nan Du
Nan Du
Xiaolong Li
SyDaLRMReLM
164
38
0
16 Apr 2024
Learn Your Reference Model for Real Good Alignment
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski
Boris Shaposhnikov
Alexey Malakhov
Nikita Surnachev
Yaroslav Aksenov
Ian Maksimov
Nikita Balagansky
Daniil Gavrilov
OffRL
119
35
0
15 Apr 2024
High-Dimension Human Value Representation in Large Language Models
High-Dimension Human Value Representation in Large Language Models
Samuel Cahyawijaya
Delong Chen
Yejin Bang
Leila Khalatbari
Bryan Wilie
Ziwei Ji
Etsuko Ishii
Pascale Fung
185
6
0
11 Apr 2024
Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models
Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models
Yutao Ouyang
Jinhan Li
Yunfei Li
Zhongyu Li
Chao Yu
Koushil Sreenath
Yi Wu
125
15
0
08 Apr 2024
Learning Heuristics for Transit Network Design and Improvement with Deep Reinforcement Learning
Learning Heuristics for Transit Network Design and Improvement with Deep Reinforcement Learning
Andrew Holliday
A. El-geneidy
Gregory Dudek
124
0
0
08 Apr 2024
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Liqiang Jing
Xinya Du
163
17
0
07 Apr 2024
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Jingyu Zhang
Marc Marone
Tianjian Li
Benjamin Van Durme
Daniel Khashabi
163
9
0
05 Apr 2024
A Survey on Large Language Model-Based Game Agents
A Survey on Large Language Model-Based Game Agents
Sihao Hu
Tiansheng Huang
Gaowen Liu
Ramana Rao Kompella
Gaowen Liu
Selim Furkan Tekin
Yichang Xu
Zachary Yahn
Ling Liu
LLMAGLM&RoAI4CELM&MA
198
57
0
02 Apr 2024
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
Hritik Bansal
Ashima Suvarna
Gantavya Bhatt
Nanyun Peng
Kai-Wei Chang
Aditya Grover
ALM
139
11
0
31 Mar 2024
Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model
Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model
Qi Gou
Cam-Tu Nguyen
105
13
0
28 Mar 2024
Previous
123...10111213
Next