ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.01756
  4. Cited By
P3O: Policy-on Policy-off Policy Optimization

P3O: Policy-on Policy-off Policy Optimization

5 May 2019
Rasool Fakoor
Pratik Chaudhari
Alex Smola
    OffRL
ArXivPDFHTML

Papers citing "P3O: Policy-on Policy-off Policy Optimization"

33 / 33 papers shown
Title
Teaching Large Language Models to Reason through Learning and Forgetting
Teaching Large Language Models to Reason through Learning and Forgetting
Tianwei Ni
Allen Nie
Sapana Chaudhary
Yao Liu
Huzefa Rangwala
Rasool Fakoor
ReLM
CLL
LRM
186
0
0
15 Apr 2025
Enhancing PPO with Trajectory-Aware Hybrid Policies
Qisai Liu
Zhanhong Jiang
Hsin-Jung Yang
Mahsa Khosravi
Joshua R. Waite
S. Sarkar
49
0
0
21 Feb 2025
AlphaRouter: Quantum Circuit Routing with Reinforcement Learning and
  Tree Search
AlphaRouter: Quantum Circuit Routing with Reinforcement Learning and Tree Search
Wei Tang
Yiheng Duan
Yaroslav Kharkov
Rasool Fakoor
Eric Kessler
Yunong Shi
45
2
0
07 Oct 2024
SAPG: Split and Aggregate Policy Gradients
SAPG: Split and Aggregate Policy Gradients
Jayesh Singla
Ananye Agarwal
Deepak Pathak
OffRL
OnRL
42
3
0
29 Jul 2024
Learning the Target Network in Function Space
Learning the Target Network in Function Space
Kavosh Asadi
Yao Liu
Shoham Sabach
Ming Yin
Rasool Fakoor
43
0
0
03 Jun 2024
Knowledge Transfer for Cross-Domain Reinforcement Learning: A Systematic
  Review
Knowledge Transfer for Cross-Domain Reinforcement Learning: A Systematic Review
Sergio A. Serrano
J. Martínez-Carranza
L. Sucar
36
0
0
26 Apr 2024
Revisiting Experience Replayable Conditions
Revisiting Experience Replayable Conditions
Taisuke Kobayashi
32
3
0
15 Feb 2024
On-Policy Policy Gradient Reinforcement Learning Without On-Policy
  Sampling
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
Nicholas Corrado
Josiah P. Hanna
OffRL
20
1
0
14 Nov 2023
Budgeting Counterfactual for Offline RL
Budgeting Counterfactual for Offline RL
Yao Liu
Pratik Chaudhari
Rasool Fakoor
OffRL
25
2
0
12 Jul 2023
Distillation Policy Optimization
Distillation Policy Optimization
Jianfei Ma
OffRL
26
1
0
01 Feb 2023
Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement
  Learning
Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning
Haoxuan Pan
Deheng Ye
Xiaoming Duan
Qiang Fu
Wei Yang
Jianping He
Mingfei Sun
OffRL
25
2
0
20 Jan 2023
Time-Varying Propensity Score to Bridge the Gap between the Past and
  Present
Time-Varying Propensity Score to Bridge the Gap between the Past and Present
Rasool Fakoor
Jonas W. Mueller
Zachary Chase Lipton
Pratik Chaudhari
Alexander J. Smola
OOD
AI4TS
32
3
0
04 Oct 2022
Generalized Policy Improvement Algorithms with Theoretically Supported
  Sample Reuse
Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse
James Queeney
I. Paschalidis
Christos G. Cassandras
OffRL
32
2
0
28 Jun 2022
Task-Agnostic Continual Reinforcement Learning: Gaining Insights and
  Overcoming Challenges
Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges
Massimo Caccia
Jonas W. Mueller
Taesup Kim
Laurent Charlin
Rasool Fakoor
CLL
32
8
0
28 May 2022
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still
  Insufficient according to an Off-Policy Measure
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure
Xing Chen
Dongcui Diao
Hechang Chen
Hengshuai Yao
Haiyin Piao
Zhixiao Sun
Zhiwei Yang
Randy Goebel
Bei Jiang
Yi-Ju Chang
OffRL
41
8
0
20 May 2022
Learning to Constrain Policy Optimization with Virtual Trust Region
Learning to Constrain Policy Optimization with Virtual Trust Region
Hung Le
Thommen Karimpanal George
Majid Abdolshah
D. Nguyen
Kien Do
Sunil R. Gupta
Svetha Venkatesh
30
3
0
20 Apr 2022
Faster Deep Reinforcement Learning with Slower Online Network
Faster Deep Reinforcement Learning with Slower Online Network
Kavosh Asadi
Rasool Fakoor
Omer Gottesman
Taesup Kim
Michael L. Littman
Alexander J. Smola
OnRL
13
6
0
10 Dec 2021
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Nicolai Dorka
Tim Welschehold
Joschka Boedecker
Wolfram Burgard
OffRL
30
9
0
24 Nov 2021
Generalized Proximal Policy Optimization with Sample Reuse
Generalized Proximal Policy Optimization with Sample Reuse
James Queeney
I. Paschalidis
Christos G. Cassandras
OffRL
34
47
0
29 Oct 2021
Cautious Actor-Critic
Cautious Actor-Critic
Lingwei Zhu
Toshinori Kitamura
Takamitsu Matsubara
AAML
33
1
0
12 Jul 2021
Coordinate-wise Control Variates for Deep Policy Gradients
Coordinate-wise Control Variates for Deep Policy Gradients
Yuanyi Zhong
Yuanshuo Zhou
Jian-wei Peng
BDL
19
1
0
11 Jul 2021
Successive Convex Approximation Based Off-Policy Optimization for
  Constrained Reinforcement Learning
Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning
Chang Tian
An Liu
Guang-Li Huang
Wu Luo
13
12
0
26 May 2021
Measuring Sample Efficiency and Generalization in Reinforcement Learning
  Benchmarks: NeurIPS 2020 Procgen Benchmark
Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark
Sharada Mohanty
Jyotish Poonganam
Adrien Gaidon
Andrey Kolobov
Blake Wulfe
...
Jacob Hilton
William H. Guss
Sahika Genc
John Schulman
K. Cobbe
23
22
0
29 Mar 2021
Continuous Doubly Constrained Batch Reinforcement Learning
Continuous Doubly Constrained Batch Reinforcement Learning
Rasool Fakoor
Jonas W. Mueller
Kavosh Asadi
Pratik Chaudhari
Alex Smola
OffRL
204
27
0
18 Feb 2021
Hyperparameter Auto-tuning in Self-Supervised Robotic Learning
Hyperparameter Auto-tuning in Self-Supervised Robotic Learning
Jiancong Huang
Juan Rojas
Matthieu Zimmer
Hongmin Wu
Y. Guan
Paul Weng
SSL
18
8
0
16 Oct 2020
Proximal Deterministic Policy Gradient
Proximal Deterministic Policy Gradient
Marco Maggipinto
Gian Antonio Susto
Pratik Chaudhari
OffRL
6
5
0
03 Aug 2020
DDPG++: Striving for Simplicity in Continuous-control Off-Policy
  Reinforcement Learning
DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning
Rasool Fakoor
Pratik Chaudhari
Alex Smola
OffRL
14
4
0
26 Jun 2020
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
Ashvin Nair
Abhishek Gupta
Murtaza Dalal
Sergey Levine
OffRL
OnRL
46
587
0
16 Jun 2020
Deployment-Efficient Reinforcement Learning via Model-Based Offline
  Optimization
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
T. Matsushima
Hiroki Furuta
Y. Matsuo
Ofir Nachum
S. Gu
OffRL
22
146
0
05 Jun 2020
MOReL : Model-Based Offline Reinforcement Learning
MOReL : Model-Based Offline Reinforcement Learning
Rahul Kidambi
Aravind Rajeswaran
Praneeth Netrapalli
Thorsten Joachims
OffRL
23
654
0
12 May 2020
Adaptive Experience Selection for Policy Gradient
Adaptive Experience Selection for Policy Gradient
S. Mohamad
Giovanni Montana
33
0
0
17 Feb 2020
Meta-Q-Learning
Meta-Q-Learning
Rasool Fakoor
Pratik Chaudhari
Stefano Soatto
Alex Smola
OffRL
25
145
0
30 Sep 2019
Trust Region-Guided Proximal Policy Optimization
Trust Region-Guided Proximal Policy Optimization
Yuhui Wang
Hao He
Xiaoyang Tan
Yaozhong Gan
OffRL
12
55
0
29 Jan 2019
1