ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.03069
  4. Cited By
Adaptive Approximate Policy Iteration

Adaptive Approximate Policy Iteration

8 February 2020
Botao Hao
N. Lazić
Yasin Abbasi-Yadkori
Pooria Joulani
Csaba Szepesvári
ArXivPDFHTML

Papers citing "Adaptive Approximate Policy Iteration"

22 / 22 papers shown
Title
Learning Expected Reward for Switched Linear Control Systems: A
  Non-Asymptotic View
Learning Expected Reward for Switched Linear Control Systems: A Non-Asymptotic View
Muhammad Naeem
Miroslav Pajic
30
1
0
15 Jun 2020
Provably Efficient Exploration in Policy Optimization
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
39
278
0
12 Dec 2019
Model-free Reinforcement Learning in Infinite-horizon Average-reward
  Markov Decision Processes
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Hiteshi Sharma
R. Jain
121
104
0
15 Oct 2019
Provably Efficient Reinforcement Learning with Linear Function
  Approximation
Provably Efficient Reinforcement Learning with Linear Function Approximation
Chi Jin
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
76
549
0
11 Jul 2019
Worst-Case Regret Bounds for Exploration via Randomized Value Functions
Worst-Case Regret Bounds for Exploration via Randomized Value Functions
Daniel Russo
OffRL
26
82
0
07 Jun 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and
  Regret Bound
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
Lin F. Yang
Mengdi Wang
OffRL
GP
50
284
0
24 May 2019
A Theory of Regularized Markov Decision Processes
A Theory of Regularized Markov Decision Processes
Matthieu Geist
B. Scherrer
Olivier Pietquin
84
317
0
31 Jan 2019
Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon
  MDP
Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP
Kefan Dong
Yuanhao Wang
Xiaoyu Chen
Liwei Wang
OffRL
36
95
0
27 Jan 2019
Is Q-learning Provably Efficient?
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
52
801
0
10 Jul 2018
Maximum a Posteriori Policy Optimisation
Maximum a Posteriori Policy Optimisation
A. Abdolmaleki
Jost Tobias Springenberg
Yuval Tassa
Rémi Munos
N. Heess
Martin Riedmiller
64
471
0
14 Jun 2018
Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in
  MDPs
Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs
M. S. Talebi
Odalric-Ambrym Maillard
47
72
0
05 Mar 2018
Efficient Bias-Span-Constrained Exploration-Exploitation in
  Reinforcement Learning
Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning
Ronan Fruit
Matteo Pirotta
A. Lazaric
R. Ortner
54
115
0
12 Feb 2018
Learning Unknown Markov Decision Processes: A Thompson Sampling Approach
Learning Unknown Markov Decision Processes: A Thompson Sampling Approach
Ouyang Yi
Mukul Gagrani
A. Nayyar
R. Jain
27
126
0
14 Sep 2017
A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism,
  Composite Objectives, and Variational Bounds
A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds
Pooria Joulani
András Gyorgy
Csaba Szepesvári
20
42
0
08 Sep 2017
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
234
18,685
0
20 Jul 2017
Deep Exploration via Randomized Value Functions
Deep Exploration via Randomized Value Functions
Ian Osband
Benjamin Van Roy
Daniel Russo
Zheng Wen
71
302
0
22 Mar 2017
Deep Reinforcement Learning with Double Q-learning
Deep Reinforcement Learning with Double Q-learning
H. V. Hasselt
A. Guez
David Silver
OffRL
131
7,590
0
22 Sep 2015
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
239
6,722
0
19 Feb 2015
Generalization and Exploration via Randomized Value Functions
Generalization and Exploration via Randomized Value Functions
Ian Osband
Benjamin Van Roy
Zheng Wen
67
314
0
04 Feb 2014
Optimization, Learning, and Games with Predictable Sequences
Optimization, Learning, and Games with Predictable Sequences
Alexander Rakhlin
Karthik Sridharan
54
377
0
08 Nov 2013
Online Learning with Predictable Sequences
Online Learning with Predictable Sequences
Alexander Rakhlin
Karthik Sridharan
112
355
0
18 Aug 2012
REGAL: A Regularization based Algorithm for Reinforcement Learning in
  Weakly Communicating MDPs
REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs
Peter L. Bartlett
Ambuj Tewari
71
280
0
09 May 2012
1