Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.01756
Cited By
P3O: Policy-on Policy-off Policy Optimization
5 May 2019
Rasool Fakoor
Pratik Chaudhari
Alex Smola
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"P3O: Policy-on Policy-off Policy Optimization"
33 / 33 papers shown
Title
Teaching Large Language Models to Reason through Learning and Forgetting
Tianwei Ni
Allen Nie
Sapana Chaudhary
Yao Liu
Huzefa Rangwala
Rasool Fakoor
ReLM
CLL
LRM
186
0
0
15 Apr 2025
Enhancing PPO with Trajectory-Aware Hybrid Policies
Qisai Liu
Zhanhong Jiang
Hsin-Jung Yang
Mahsa Khosravi
Joshua R. Waite
S. Sarkar
49
0
0
21 Feb 2025
AlphaRouter: Quantum Circuit Routing with Reinforcement Learning and Tree Search
Wei Tang
Yiheng Duan
Yaroslav Kharkov
Rasool Fakoor
Eric Kessler
Yunong Shi
45
2
0
07 Oct 2024
SAPG: Split and Aggregate Policy Gradients
Jayesh Singla
Ananye Agarwal
Deepak Pathak
OffRL
OnRL
42
3
0
29 Jul 2024
Learning the Target Network in Function Space
Kavosh Asadi
Yao Liu
Shoham Sabach
Ming Yin
Rasool Fakoor
43
0
0
03 Jun 2024
Knowledge Transfer for Cross-Domain Reinforcement Learning: A Systematic Review
Sergio A. Serrano
J. Martínez-Carranza
L. Sucar
36
0
0
26 Apr 2024
Revisiting Experience Replayable Conditions
Taisuke Kobayashi
32
3
0
15 Feb 2024
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
Nicholas Corrado
Josiah P. Hanna
OffRL
20
1
0
14 Nov 2023
Budgeting Counterfactual for Offline RL
Yao Liu
Pratik Chaudhari
Rasool Fakoor
OffRL
25
2
0
12 Jul 2023
Distillation Policy Optimization
Jianfei Ma
OffRL
26
1
0
01 Feb 2023
Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning
Haoxuan Pan
Deheng Ye
Xiaoming Duan
Qiang Fu
Wei Yang
Jianping He
Mingfei Sun
OffRL
25
2
0
20 Jan 2023
Time-Varying Propensity Score to Bridge the Gap between the Past and Present
Rasool Fakoor
Jonas W. Mueller
Zachary Chase Lipton
Pratik Chaudhari
Alexander J. Smola
OOD
AI4TS
32
3
0
04 Oct 2022
Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse
James Queeney
I. Paschalidis
Christos G. Cassandras
OffRL
32
2
0
28 Jun 2022
Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges
Massimo Caccia
Jonas W. Mueller
Taesup Kim
Laurent Charlin
Rasool Fakoor
CLL
32
8
0
28 May 2022
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure
Xing Chen
Dongcui Diao
Hechang Chen
Hengshuai Yao
Haiyin Piao
Zhixiao Sun
Zhiwei Yang
Randy Goebel
Bei Jiang
Yi-Ju Chang
OffRL
41
8
0
20 May 2022
Learning to Constrain Policy Optimization with Virtual Trust Region
Hung Le
Thommen Karimpanal George
Majid Abdolshah
D. Nguyen
Kien Do
Sunil R. Gupta
Svetha Venkatesh
30
3
0
20 Apr 2022
Faster Deep Reinforcement Learning with Slower Online Network
Kavosh Asadi
Rasool Fakoor
Omer Gottesman
Taesup Kim
Michael L. Littman
Alexander J. Smola
OnRL
13
6
0
10 Dec 2021
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Nicolai Dorka
Tim Welschehold
Joschka Boedecker
Wolfram Burgard
OffRL
30
9
0
24 Nov 2021
Generalized Proximal Policy Optimization with Sample Reuse
James Queeney
I. Paschalidis
Christos G. Cassandras
OffRL
34
47
0
29 Oct 2021
Cautious Actor-Critic
Lingwei Zhu
Toshinori Kitamura
Takamitsu Matsubara
AAML
33
1
0
12 Jul 2021
Coordinate-wise Control Variates for Deep Policy Gradients
Yuanyi Zhong
Yuanshuo Zhou
Jian-wei Peng
BDL
19
1
0
11 Jul 2021
Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning
Chang Tian
An Liu
Guang-Li Huang
Wu Luo
13
12
0
26 May 2021
Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark
Sharada Mohanty
Jyotish Poonganam
Adrien Gaidon
Andrey Kolobov
Blake Wulfe
...
Jacob Hilton
William H. Guss
Sahika Genc
John Schulman
K. Cobbe
23
22
0
29 Mar 2021
Continuous Doubly Constrained Batch Reinforcement Learning
Rasool Fakoor
Jonas W. Mueller
Kavosh Asadi
Pratik Chaudhari
Alex Smola
OffRL
204
27
0
18 Feb 2021
Hyperparameter Auto-tuning in Self-Supervised Robotic Learning
Jiancong Huang
Juan Rojas
Matthieu Zimmer
Hongmin Wu
Y. Guan
Paul Weng
SSL
18
8
0
16 Oct 2020
Proximal Deterministic Policy Gradient
Marco Maggipinto
Gian Antonio Susto
Pratik Chaudhari
OffRL
6
5
0
03 Aug 2020
DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning
Rasool Fakoor
Pratik Chaudhari
Alex Smola
OffRL
14
4
0
26 Jun 2020
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
Ashvin Nair
Abhishek Gupta
Murtaza Dalal
Sergey Levine
OffRL
OnRL
46
587
0
16 Jun 2020
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
T. Matsushima
Hiroki Furuta
Y. Matsuo
Ofir Nachum
S. Gu
OffRL
22
146
0
05 Jun 2020
MOReL : Model-Based Offline Reinforcement Learning
Rahul Kidambi
Aravind Rajeswaran
Praneeth Netrapalli
Thorsten Joachims
OffRL
23
654
0
12 May 2020
Adaptive Experience Selection for Policy Gradient
S. Mohamad
Giovanni Montana
33
0
0
17 Feb 2020
Meta-Q-Learning
Rasool Fakoor
Pratik Chaudhari
Stefano Soatto
Alex Smola
OffRL
25
145
0
30 Sep 2019
Trust Region-Guided Proximal Policy Optimization
Yuhui Wang
Hao He
Xiaoyang Tan
Yaozhong Gan
OffRL
12
55
0
29 Jan 2019
1