ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.06226
  4. Cited By
Pessimistic Model-based Offline Reinforcement Learning under Partial
  Coverage
v1v2v3v4 (latest)

Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage

13 July 2021
Masatoshi Uehara
Wen Sun
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage"

13 / 63 papers shown
Title
Sample Complexity of Reinforcement Learning using Linearly Combined
  Model Ensembles
Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles
Aditya Modi
Nan Jiang
Ambuj Tewari
Satinder Singh
70
132
0
23 Oct 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
72
321
0
01 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function
  Approximation
Provably Efficient Reinforcement Learning with Linear Function Approximation
Chi Jin
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
100
560
0
11 Jul 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and
  Regret Bound
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
Lin F. Yang
Mengdi Wang
OffRLGP
66
288
0
24 May 2019
Information-Theoretic Considerations in Batch Reinforcement Learning
Information-Theoretic Considerations in Batch Reinforcement Learning
Jinglin Chen
Nan Jiang
OODOffRL
161
378
0
01 May 2019
A Theoretical Analysis of Deep Q-Learning
A Theoretical Analysis of Deep Q-Learning
Jianqing Fan
Zhuoran Yang
Yuchen Xie
Zhaoran Wang
190
606
0
01 Jan 2019
Off-Policy Deep Reinforcement Learning without Exploration
Off-Policy Deep Reinforcement Learning without Exploration
Scott Fujimoto
David Meger
Doina Precup
OffRLBDL
246
1,624
0
07 Dec 2018
Top-K Off-Policy Correction for a REINFORCE Recommender System
Top-K Off-Policy Correction for a REINFORCE Recommender System
Minmin Chen
Alex Beutel
Paul Covington
Sagar Jain
Francois Belletti
Ed H. Chi
CMLOffRL
117
482
0
06 Dec 2018
The total variation distance between high-dimensional Gaussians with the
  same mean
The total variation distance between high-dimensional Gaussians with the same mean
Luc Devroye
Abbas Mehrabian
Tommy Reddad
72
230
0
19 Oct 2018
Finite-Time Analysis of Kernelised Contextual Bandits
Finite-Time Analysis of Kernelised Contextual Bandits
Michal Valko
N. Korda
Rémi Munos
I. Flaounas
N. Cristianini
188
275
0
26 Sep 2013
Policy Iteration for Factored MDPs
Policy Iteration for Factored MDPs
D. Koller
Ronald E. Parr
OffRL
320
179
0
16 Jan 2013
Learning to Optimize Via Posterior Sampling
Learning to Optimize Via Posterior Sampling
Daniel Russo
Benjamin Van Roy
206
703
0
11 Jan 2013
Agnostic System Identification for Model-Based Reinforcement Learning
Agnostic System Identification for Model-Based Reinforcement Learning
Stéphane Ross
Drew Bagnell
83
146
0
05 Mar 2012
Previous
12