ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.01612
  4. Cited By
Online learning in MDPs with linear function approximation and bandit
  feedback
v1v2 (latest)

Online learning in MDPs with linear function approximation and bandit feedback

3 July 2020
Gergely Neu
Julia Olkhovskaya
ArXiv (abs)PDFHTML

Papers citing "Online learning in MDPs with linear function approximation and bandit feedback"

11 / 11 papers shown
Title
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
Yongtao Wu
Luca Viano
Yihang Chen
Zhenyu Zhu
Kimon Antonakopoulos
Quanquan Gu
Volkan Cevher
133
1
0
18 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
142
45
0
31 Dec 2024
Logistic Q-Learning
Logistic Q-Learning
Joan Bas-Serrano
Sebastian Curi
Andreas Krause
Gergely Neu
72
40
0
21 Oct 2020
FLAMBE: Structural Complexity and Representation Learning of Low Rank
  MDPs
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
Alekh Agarwal
Sham Kakade
A. Krishnamurthy
Wen Sun
OffRL
165
226
0
18 Jun 2020
Provably Efficient Exploration in Policy Optimization
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
58
281
0
12 Dec 2019
Exploration-Enhanced POLITEX
Exploration-Enhanced POLITEX
Yasin Abbasi-Yadkori
N. Lazić
Csaba Szepesvári
Gellert Weisz
52
23
0
27 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function
  Approximation
Provably Efficient Reinforcement Learning with Linear Function Approximation
Chi Jin
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
98
557
0
11 Jul 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and
  Regret Bound
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
Lin F. Yang
Mengdi Wang
OffRLGP
62
286
0
24 May 2019
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement
  Learning
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Christoph Dann
Tor Lattimore
Emma Brunskill
76
309
0
22 Mar 2017
An efficient algorithm for learning with semi-bandit feedback
An efficient algorithm for learning with semi-bandit feedback
Gergely Neu
Gábor Bartók
119
80
0
13 May 2013
On the Sample Complexity of Reinforcement Learning with a Generative
  Model
On the Sample Complexity of Reinforcement Learning with a Generative Model
M. G. Azar
Rémi Munos
H. Kappen
74
156
0
27 Jun 2012
1