ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.11849
  4. Cited By
Learning Infinite-horizon Average-reward MDPs with Linear Function
  Approximation
v1v2 (latest)

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

23 July 2020
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Rahul Jain
ArXiv (abs)PDFHTML

Papers citing "Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation"

26 / 26 papers shown
Title
Improved Analysis of UCRL2 with Empirical Bernstein Inequality
Improved Analysis of UCRL2 with Empirical Bernstein Inequality
Ronan Fruit
Matteo Pirotta
A. Lazaric
34
33
0
10 Jul 2020
Online learning in MDPs with linear function approximation and bandit
  feedback
Online learning in MDPs with linear function approximation and bandit feedback
Gergely Neu
Julia Olkhovskaya
36
32
0
03 Jul 2020
Reinforcement Learning with General Value Function Approximation:
  Provably Efficient Approach via Bounded Eluder Dimension
Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension
Ruosong Wang
Ruslan Salakhutdinov
Lin F. Yang
62
55
0
21 May 2020
Learning Near Optimal Policies with Low Inherent Bellman Error
Learning Near Optimal Policies with Low Inherent Bellman Error
Andrea Zanette
A. Lazaric
Mykel Kochenderfer
Emma Brunskill
OffRL
71
222
0
29 Feb 2020
Adaptive Approximate Policy Iteration
Adaptive Approximate Policy Iteration
Botao Hao
N. Lazić
Yasin Abbasi-Yadkori
Pooria Joulani
Csaba Szepesvári
61
14
0
08 Feb 2020
Model-free Reinforcement Learning in Infinite-horizon Average-reward
  Markov Decision Processes
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Hiteshi Sharma
R. Jain
132
106
0
15 Oct 2019
$\sqrt{n}$-Regret for Learning in Markov Decision Processes with
  Function Approximation and Low Bellman Rank
n\sqrt{n}n​-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank
Kefan Dong
Jian-wei Peng
Yining Wang
Yuanshuo Zhou
OffRL
53
36
0
05 Sep 2019
Neural Policy Gradient Methods: Global Optimality and Rates of
  Convergence
Neural Policy Gradient Methods: Global Optimality and Rates of Convergence
Lingxiao Wang
Qi Cai
Zhuoran Yang
Zhaoran Wang
85
241
0
29 Aug 2019
Exploration-Enhanced POLITEX
Exploration-Enhanced POLITEX
Yasin Abbasi-Yadkori
N. Lazić
Csaba Szepesvári
Gellert Weisz
52
23
0
27 Aug 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
69
321
0
01 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function
  Approximation
Provably Efficient Reinforcement Learning with Linear Function Approximation
Chi Jin
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
96
557
0
11 Jul 2019
Neural Proximal/Trust Region Policy Optimization Attains Globally
  Optimal Policy
Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
Boyi Liu
Qi Cai
Zhuoran Yang
Zhaoran Wang
73
111
0
25 Jun 2019
Regret Minimization for Reinforcement Learning by Evaluating the Optimal
  Bias Function
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function
Zihan Zhang
Xiangyang Ji
60
72
0
12 Jun 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and
  Regret Bound
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
Lin F. Yang
Mengdi Wang
OffRLGP
62
286
0
24 May 2019
Regret Bounds for Reinforcement Learning via Markov Chain Concentration
Regret Bounds for Reinforcement Learning via Markov Chain Concentration
R. Ortner
67
46
0
06 Aug 2018
Scalable Bilinear $π$ Learning Using State and Action Features
Scalable Bilinear πππ Learning Using State and Action Features
Yichen Chen
Lihong Li
Mengdi Wang
64
46
0
27 Apr 2018
Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in
  MDPs
Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs
M. S. Talebi
Odalric-Ambrym Maillard
56
72
0
05 Mar 2018
Efficient Bias-Span-Constrained Exploration-Exploitation in
  Reinforcement Learning
Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning
Ronan Fruit
Matteo Pirotta
A. Lazaric
R. Ortner
86
116
0
12 Feb 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
517
19,065
0
20 Jul 2017
A unified view of entropy-regularized Markov decision processes
A unified view of entropy-regularized Markov decision processes
Gergely Neu
Anders Jonsson
Vicencc Gómez
97
263
0
22 May 2017
Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
199
8,859
0
04 Feb 2016
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
277
6,776
0
19 Feb 2015
Generalization and Exploration via Randomized Value Functions
Generalization and Exploration via Randomized Value Functions
Ian Osband
Benjamin Van Roy
Zheng Wen
79
314
0
04 Feb 2014
Volumetric Spanners: an Efficient Exploration Basis for Learning
Volumetric Spanners: an Efficient Exploration Basis for Learning
Elad Hazan
Zohar Karnin
Raghu Mehka
255
97
0
21 Dec 2013
REGAL: A Regularization based Algorithm for Reinforcement Learning in
  Weakly Communicating MDPs
REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs
Peter L. Bartlett
Ambuj Tewari
91
284
0
09 May 2012
Towards minimax policies for online linear optimization with bandit
  feedback
Towards minimax policies for online linear optimization with bandit feedback
Sébastien Bubeck
Nicolò Cesa-Bianchi
Sham Kakade
OffRL
283
150
0
14 Feb 2012
1