Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

28 May 2019

Papers citing "Connections Between Mirror Descent, Thompson Sampling and the Information Ratio"

13 / 13 papers shown

Title
$A Simple and Adaptive Learning Rate for FTRL in Online Learning with Minimax Regret of $Θ(T^{2/3})$ and its Application to Best-of-Both-Worlds$ A Simple and Adaptive Learning Rate for FTRL in Online Learning with Minimax Regret of $Θ(T^{2/3})$ and its Application to Best-of-Both-Worlds Taira Tsuchiya Shinji Ito 26 0 0 30 May 2024
On the Minimax Regret for Online Learning with Feedback Graphs Khaled Eldowa Emmanuel Esposito Tommaso Cesari Nicolò Cesa-Bianchi 33 8 0 24 May 2023
Bayesian Reinforcement Learning with Limited Cognitive Load Dilip Arumugam Mark K. Ho Noah D. Goodman Benjamin Van Roy OffRL 34 8 0 05 May 2023
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond Christoph Dann Chen-Yu Wei Julian Zimmert 26 22 0 20 Feb 2023
STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning Souradip Chakraborty Amrit Singh Bedi Alec Koppel Mengdi Wang Furong Huang Dinesh Manocha 24 7 0 28 Jan 2023
Anytime-valid off-policy inference for contextual bandits Ian Waudby-Smith Lili Wu Aaditya Ramdas Nikos Karampatziakis Paul Mineiro OffRL 43 25 0 19 Oct 2022
Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds Shinji Ito Taira Tsuchiya Junya Honda AAML 23 16 0 14 Jun 2022
Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret Tor Lattimore 22 16 0 22 Feb 2022
Gaussian Imagination in Bandit Learning Yueyang Liu Adithya M. Devraj Benjamin Van Roy Kuang Xu 40 7 0 06 Jan 2022
The Value of Information When Deciding What to Learn Dilip Arumugam Benjamin Van Roy 37 12 0 26 Oct 2021
Reinforcement Learning, Bit by Bit Xiuyuan Lu Benjamin Van Roy Vikranth Dwaracherla M. Ibrahimi Ian Osband Zheng Wen 30 70 0 06 Mar 2021
Exploration by Optimisation in Partial Monitoring Tor Lattimore Csaba Szepesvári 33 38 0 12 Jul 2019
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits Julian Zimmert Yevgeny Seldin AAML 24 175 0 19 Jul 2018