Improved Variance-Aware Confidence Sets for Linear Bandits and Linear
Mixture MDP

Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP

29 January 2021

Papers citing "Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP"

11 / 11 papers shown

Title
Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance S. Iwazaki Shion Takeno 76 1 0 10 Feb 2025
Variance-Aware Linear UCB with Deep Representation for Neural Contextual Bandits H. Bui Enrique Mallada Anqi Liu 106 0 0 08 Nov 2024
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits Zhiyong Wang Jize Xie Yi Chen J. C. Lui Dongruo Zhou 28 0 0 15 Mar 2024
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes Han Zhong Tong Zhang 32 26 0 15 May 2023
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments Runlong Zhou Zihan Zhang S. Du 44 10 0 31 Jan 2023
SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits Subhojyoti Mukherjee Qiaomin Xie Josiah P. Hanna R. Nowak OffRL 45 5 0 29 Jan 2023
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes Jiafan He Heyang Zhao Dongruo Zhou Quanquan Gu OffRL 51 53 0 12 Dec 2022
Learning Stochastic Shortest Path with Linear Function Approximation Steffen Czolbe Jiafan He Adrian V. Dalca Quanquan Gu 39 30 0 25 Oct 2021
UCB Momentum Q-learning: Correcting the bias without forgetting Pierre Menard O. D. Domingues Xuedong Shang Michal Valko 79 40 0 01 Mar 2021
Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits Marc Abeille Louis Faury Clément Calauzènes 96 37 0 23 Oct 2020
Optimism in Reinforcement Learning with Generalized Linear Function Approximation Yining Wang Ruosong Wang S. Du A. Krishnamurthy 132 135 0 09 Dec 2019