Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.08384
Cited By
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
14 November 2023
Yifei Zhou
Ayush Sekhari
Yuda Song
Wen Sun
OffRL
OnRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees"
20 / 20 papers shown
Title
Improving Offline-to-Online Reinforcement Learning with Q-Ensembles
Kai-Wen Zhao
Yi-An Ma
Jianye Hao
Jinyi Liu
Yan Zheng
Zhaopeng Meng
OffRL
OnRL
85
12
0
12 Jun 2023
Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient
Yuda Song
Yi Zhou
Ayush Sekhari
J. Andrew Bagnell
A. Krishnamurthy
Wen Sun
OffRL
OnRL
68
103
0
13 Oct 2022
Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
Xuezhou Zhang
Yuda Song
Masatoshi Uehara
Mengdi Wang
Alekh Agarwal
Wen Sun
OffRL
70
58
0
31 Jan 2022
Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation
Dylan J. Foster
A. Krishnamurthy
D. Simchi-Levi
Yunzong Xu
OffRL
142
63
0
21 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
388
1,088
0
13 Oct 2021
Representation Learning for Online and Offline RL in Low-rank MDPs
Masatoshi Uehara
Xuezhou Zhang
Wen Sun
OffRL
115
129
0
09 Oct 2021
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage
Masatoshi Uehara
Wen Sun
OffRL
141
150
0
13 Jul 2021
Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble
Seunghyun Lee
Younggyo Seo
Kimin Lee
Pieter Abbeel
Jinwoo Shin
OffRL
OnRL
60
189
0
01 Jul 2021
Bellman-consistent Pessimism for Offline Reinforcement Learning
Tengyang Xie
Ching-An Cheng
Nan Jiang
Paul Mineiro
Alekh Agarwal
OffRL
LRM
145
276
0
13 Jun 2021
Bilinear Classes: A Structural Framework for Provable Generalization in RL
S. Du
Sham Kakade
Jason D. Lee
Shachar Lovett
G. Mahajan
Wen Sun
Ruosong Wang
OffRL
160
191
0
19 Mar 2021
Conservative Q-Learning for Offline Reinforcement Learning
Aviral Kumar
Aurick Zhou
George Tucker
Sergey Levine
OffRL
OnRL
137
1,812
0
08 Jun 2020
Dota 2 with Large Scale Deep Reinforcement Learning
OpenAI OpenAI
:
Christopher Berner
Greg Brockman
Brooke Chan
...
Szymon Sidor
Ilya Sutskever
Jie Tang
Filip Wolski
Susan Zhang
GNN
VLM
CLL
AI4CE
LRM
164
1,823
0
13 Dec 2019
Solving Rubik's Cube with a Robot Hand
OpenAI
Ilge Akkaya
Marcin Andrychowicz
Maciek Chociej
Ma-teusz Litwin
...
Peter Welinder
Lilian Weng
Qiming Yuan
Wojciech Zaremba
Lei Zhang
ODL
113
1,227
0
16 Oct 2019
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
Aviral Kumar
Justin Fu
George Tucker
Sergey Levine
OffRL
OnRL
121
1,056
0
03 Jun 2019
Information-Theoretic Considerations in Batch Reinforcement Learning
Jinglin Chen
Nan Jiang
OOD
OffRL
156
376
0
01 May 2019
Provably efficient RL with Rich Observations via Latent State Decoding
S. Du
A. Krishnamurthy
Nan Jiang
Alekh Agarwal
Miroslav Dudík
John Langford
OffRL
66
230
0
25 Jan 2019
Soft Actor-Critic Algorithms and Applications
Tuomas Haarnoja
Aurick Zhou
Kristian Hartikainen
George Tucker
Sehoon Ha
...
Vikash Kumar
Henry Zhu
Abhishek Gupta
Pieter Abbeel
Sergey Levine
136
2,425
0
13 Dec 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
496
19,019
0
20 Jul 2017
Online Nonparametric Regression
Alexander Rakhlin
Karthik Sridharan
174
101
0
11 Feb 2014
Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view
B. Scherrer
78
102
0
19 Nov 2010
1