Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.06347
Cited By
Proximal Policy Optimization Algorithms
20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Proximal Policy Optimization Algorithms"
50 / 6,731 papers shown
Title
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
Xuzhao Li
MLLM
71
0
0
29 Apr 2025
Multi-Agent Reinforcement Learning for Resources Allocation Optimization: A Survey
Mohamad Abdul Hady
Siyi Hu
Mahardhika Pratama
Jimmy Cao
Ryszard Kowalczyk
24
0
0
29 Apr 2025
Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation
Harry Mead
Clarissa Costen
Bruno Lacerda
Nick Hawes
24
0
0
29 Apr 2025
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Joykirat Singh
Raghav Magazine
Yash Pandya
A. Nambi
LLMAG
KELM
OffRL
LRM
170
2
0
28 Apr 2025
ARMOR: Adaptive Meshing with Reinforcement Optimization for Real-time 3D Monitoring in Unexposed Scenes
Yizhe Zhang
Jianping Li
Xin Zhao
Fuxun Liang
Z. Dong
Bisheng Yang
AI4CE
33
0
0
28 Apr 2025
Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets
Adam Younsi
Abdalgader Abubaker
M. Seddik
Hakim Hacid
Salem Lahlou
LRM
57
0
0
28 Apr 2025
GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets
Mingqian He
Fei Zhao
Chonggang Lu
Ziqiang Liu
Yuping Wang
Haofu Qian
OffRL
AI4TS
VLM
72
0
0
28 Apr 2025
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
J. Zhang
Flood Sung
Zhiyong Yang
Yang Gao
Chongjie Zhang
LLMAG
44
0
0
28 Apr 2025
Model-based controller assisted domain randomization in deep reinforcement learning: application to nonlinear powertrain control
Heisei Yonezawa
Ansei Yonezawa
Itsuro Kajiwara
49
0
0
28 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
Xuzhao Li
Kwan-Yee K. Wong
LLMAG
ReLM
LRM
91
0
0
27 Apr 2025
Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers
Shuang Wang
H. Zhang
Tianxing Wu
Yuyao Zhang
W. Zhang
Quan Z. Sheng
29
0
0
27 Apr 2025
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Ren-Wei Liang
Chin-Ting Hsu
Chan-Hung Yu
Saransh Agrawal
Shih-Cheng Huang
Shang-Tse Chen
Kuan-Hao Huang
Shao-Hua Sun
81
0
0
27 Apr 2025
Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments
Yun Qu
Luu Anh Tuan
Yixiu Mao
Yiqin Lv
Xiangyang Ji
TTA
90
0
0
27 Apr 2025
HyperController: A Hyperparameter Controller for Fast and Stable Training of Reinforcement Learning Neural Networks
J. Gornet
Yiannis Kantaros
Bruno Sinopoli
161
0
0
27 Apr 2025
Neurophysiologically Realistic Environment for Comparing Adaptive Deep Brain Stimulation Algorithms in Parkinson Disease
Ekaterina Kuzmina
Dmitrii Kriukov
Mikhail Lebedev
Dmitry Dylov
OOD
29
0
0
26 Apr 2025
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
Haoran Geng
Feishi Wang
Songlin Wei
Y. Li
Bangjun Wang
...
Hao Dong
Siyuan Huang
Yue Wang
Jitendra Malik
Pieter Abbeel
85
4
0
26 Apr 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
68
0
0
26 Apr 2025
Hierarchical Reinforcement Learning in Multi-Goal Spatial Navigation with Autonomous Mobile Robots
Brendon Johnson
Alfredo Weitzenfeld
29
0
0
26 Apr 2025
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao
B. Zhu
Qianru Sun
Hanwang Zhang
MLLM
LRM
86
0
0
25 Apr 2025
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Jun Liu
Hangyu Guo
Ranjie Duan
Xingyuan Bu
Yancheng He
...
Yingshui Tan
Yanan Wu
Jihao Gu
Heng Chang
Jun Zhu
MLLM
190
0
0
25 Apr 2025
Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding
Kun Li
Jingbo Wang
Yangfan He
Xinyuan Song
Ruoyu Wang
...
Keqin Li
Sida Li
Miao Zhang
Tianyu Shi
Xueqian Wang
50
0
0
25 Apr 2025
TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
Gwen Yidou Weng
Benjie Wang
Mathias Niepert
BDL
170
0
0
25 Apr 2025
RL-Driven Data Generation for Robust Vision-Based Dexterous Grasping
Atsushi Kanehira
Naoki Wake
Kazuhiro Sasabuchi
Jun Takamatsu
Katsushi Ikeuchi
42
0
0
25 Apr 2025
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models
Caia Costello
Simon Guo
Anna Goldie
Azalia Mirhoseini
ReLM
SyDa
LRM
116
1
0
25 Apr 2025
Learning from Less: SINDy Surrogates in RL
Aniket Dixit
Muhammad Ibrahim Khan
Faizan Ahmed
James Brusey
45
0
0
25 Apr 2025
AI Awareness
Xianrui Li
Haoyuan Shi
Rongwu Xu
Wei Xu
59
0
0
25 Apr 2025
Evolution Meets Diffusion: Efficient Neural Architecture Generation
Bingye Zhou
Caiyang Yu
DiffM
60
0
0
24 Apr 2025
Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control
Haoran Wang
Zhiwei Shi
Chengxi Zhu
Yafei Qiao
Cheng Zhang
Fan Yang
Pengjie Ren
Lan Lu
D. Xuan
71
1
0
24 Apr 2025
Do We Need Transformers to Play FPS Video Games?
Karmanbir Batth
Krish Sethi
Aly Shariff
Leo Shi
Hetul Patel
OffRL
AI4CE
36
0
0
24 Apr 2025
High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures
A. J Miller
Fangzhou Yu
Michael Brauckmann
Farbod Farshidian
OffRL
BDL
23
0
0
24 Apr 2025
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Zihan Wang
Kaidi Wang
Q. Wang
Pingyue Zhang
Linjie Li
...
Jiajun Wu
L. Fei-Fei
Lijuan Wang
Yejin Choi
Manling Li
92
4
0
24 Apr 2025
Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning
Mingqi Yuan
Qi Wang
Guozheng Ma
Bo-wen Li
Xin Jin
Yunbo Wang
Xiaokang Yang
Wenjun Zeng
D. Tao
OffRL
AI4CE
38
0
0
24 Apr 2025
Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization
Hongshu Guo
Wenjie Qiu
Zeyuan Ma
Xuzhi Zhang
Jun Zhang
Jiawei Liu
51
1
0
24 Apr 2025
CaRL: Learning Scalable Planning Policies with Simple Rewards
Bernhard Jaeger
D. Dauner
Jens Beißwenger
Simon Gerstenecker
Kashyap Chitta
Andreas Geiger
60
0
0
24 Apr 2025
A Comprehensive Survey of Synthetic Tabular Data Generation
Ruxue Shi
Yili Wang
Mengnan Du
Xu Shen
Xin Wang
49
2
0
23 Apr 2025
Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator
Chenhao Li
Andreas Krause
Marco Hutter
OffRL
33
0
0
23 Apr 2025
Reinforcement learning framework for the mechanical design of microelectronic components under multiphysics constraints
S. Nair
Timothy F. Walsh
Greg Pickrell
Fabio Semperlotti
32
0
0
23 Apr 2025
SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward
Nicolas Jonason
Luca Casini
Bob L. T. Sturm
29
1
0
23 Apr 2025
Param
Δ
Δ
Δ
for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
85
0
0
23 Apr 2025
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
Wenxuan Li
Hang Zhao
Zhiyuan Yu
Yu Du
Qin Zou
Ruizhen Hu
K. Xu
SSL
83
1
0
23 Apr 2025
Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification
Balaji Rao
William Eiers
Carlo Lipizzi
37
0
0
23 Apr 2025
Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms
Hsin-Jung Yang
Mahsa Khosravi
Benjamin Walt
Girish Krishnan
Soumik Sarkar
15
0
0
23 Apr 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
56
0
0
22 Apr 2025
CaRoSaC: A Reinforcement Learning-Based Kinematic Control of Cable-Driven Parallel Robots by Addressing Cable Sag through Simulation
Rohit Dhakate
Thomas Jantos
Eren Allak
Stephan Weiss
J. Steinbrener
37
0
0
22 Apr 2025
Learning Explainable Dense Reward Shapes via Bayesian Optimization
Ryan Koo
Ian Yang
Vipul Raheja
Mingyi Hong
Kwang-Sung Jun
Dongyeop Kang
31
0
0
22 Apr 2025
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Junshu Pan
Wei Shen
Shulin Huang
Qiji Zhou
Yue Zhang
74
0
0
22 Apr 2025
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
Siyu Zhou
Tianyi Zhou
Yijun Yang
Guodong Long
Deheng Ye
Jing Jiang
Chengqi Zhang
LM&Ro
32
0
0
22 Apr 2025
Autonomous Control of Redundant Hydraulic Manipulator Using Reinforcement Learning with Action Feedback
Rohit Dhakate
Christian Brommer
C. Böhm
Stephan Weiss
J. Steinbrener
36
5
0
22 Apr 2025
AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization
Jinda Lu
Jinghan Li
Yuan Gao
Junkang Wu
Jiancan Wu
Xuben Wang
Xiangnan He
162
0
0
22 Apr 2025
Dynamic Early Exit in Reasoning Models
Chenxu Yang
Qingyi Si
Yongjie Duan
Zheliang Zhu
Chenyu Zhu
Zheng-Shen Lin
Zheng Lin
Li Cao
Weiping Wang
ReLM
LRM
48
0
0
22 Apr 2025
Previous
1
2
3
4
5
6
...
133
134
135
Next