Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.01612
Cited By
v1
v2 (latest)
Online learning in MDPs with linear function approximation and bandit feedback
3 July 2020
Gergely Neu
Julia Olkhovskaya
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Online learning in MDPs with linear function approximation and bandit feedback"
11 / 11 papers shown
Title
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
Yongtao Wu
Luca Viano
Yihang Chen
Zhenyu Zhu
Kimon Antonakopoulos
Quanquan Gu
Volkan Cevher
133
1
0
18 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
142
45
0
31 Dec 2024
Logistic Q-Learning
Joan Bas-Serrano
Sebastian Curi
Andreas Krause
Gergely Neu
72
40
0
21 Oct 2020
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
Alekh Agarwal
Sham Kakade
A. Krishnamurthy
Wen Sun
OffRL
165
226
0
18 Jun 2020
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
60
281
0
12 Dec 2019
Exploration-Enhanced POLITEX
Yasin Abbasi-Yadkori
N. Lazić
Csaba Szepesvári
Gellert Weisz
52
23
0
27 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function Approximation
Chi Jin
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
98
559
0
11 Jul 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
Lin F. Yang
Mengdi Wang
OffRL
GP
62
286
0
24 May 2019
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Christoph Dann
Tor Lattimore
Emma Brunskill
76
309
0
22 Mar 2017
An efficient algorithm for learning with semi-bandit feedback
Gergely Neu
Gábor Bartók
119
80
0
13 May 2013
On the Sample Complexity of Reinforcement Learning with a Generative Model
M. G. Azar
Rémi Munos
H. Kappen
74
156
0
27 Jun 2012
1