Online learning in MDPs with linear function approximation and bandit
feedback

v1v2 (latest)

Online learning in MDPs with linear function approximation and bandit feedback

3 July 2020

Julia Olkhovskaya

ArXiv (abs)PDF HTML

Papers citing "Online learning in MDPs with linear function approximation and bandit feedback"

11 / 11 papers shown

Title
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees Yongtao Wu Luca Viano Yihang Chen Zhenyu Zhu Kimon Antonakopoulos Quanquan Gu Volkan Cevher 133 1 0 18 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning Chen-Yu Wei Christoph Dann Julian Zimmert 142 45 0 31 Dec 2024
Logistic Q-Learning Joan Bas-Serrano Sebastian Curi Andreas Krause Gergely Neu 72 40 0 21 Oct 2020
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs Alekh Agarwal Sham Kakade A. Krishnamurthy Wen Sun OffRL 165 226 0 18 Jun 2020
Provably Efficient Exploration in Policy Optimization Qi Cai Zhuoran Yang Chi Jin Zhaoran Wang 58 281 0 12 Dec 2019
Exploration-Enhanced POLITEX Yasin Abbasi-Yadkori N. Lazić Csaba Szepesvári Gellert Weisz 52 23 0 27 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function Approximation Chi Jin Zhuoran Yang Zhaoran Wang Michael I. Jordan 98 557 0 11 Jul 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound Lin F. Yang Mengdi Wang OffRL GP 62 286 0 24 May 2019
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning Christoph Dann Tor Lattimore Emma Brunskill 76 309 0 22 Mar 2017
An efficient algorithm for learning with semi-bandit feedback Gergely Neu Gábor Bartók 119 80 0 13 May 2013
On the Sample Complexity of Reinforcement Learning with a Generative Model M. G. Azar Rémi Munos H. Kappen 74 156 0 27 Jun 2012