ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.11817
  4. Cited By
Connections Between Mirror Descent, Thompson Sampling and the
  Information Ratio

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

28 May 2019
Julian Zimmert
Tor Lattimore
ArXivPDFHTML

Papers citing "Connections Between Mirror Descent, Thompson Sampling and the Information Ratio"

13 / 13 papers shown
Title
A Simple and Adaptive Learning Rate for FTRL in Online Learning with
  Minimax Regret of $Θ(T^{2/3})$ and its Application to
  Best-of-Both-Worlds
A Simple and Adaptive Learning Rate for FTRL in Online Learning with Minimax Regret of Θ(T2/3)Θ(T^{2/3})Θ(T2/3) and its Application to Best-of-Both-Worlds
Taira Tsuchiya
Shinji Ito
26
0
0
30 May 2024
On the Minimax Regret for Online Learning with Feedback Graphs
On the Minimax Regret for Online Learning with Feedback Graphs
Khaled Eldowa
Emmanuel Esposito
Tommaso Cesari
Nicolò Cesa-Bianchi
33
8
0
24 May 2023
Bayesian Reinforcement Learning with Limited Cognitive Load
Bayesian Reinforcement Learning with Limited Cognitive Load
Dilip Arumugam
Mark K. Ho
Noah D. Goodman
Benjamin Van Roy
OffRL
34
8
0
05 May 2023
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond
Christoph Dann
Chen-Yu Wei
Julian Zimmert
26
22
0
20 Feb 2023
STEERING: Stein Information Directed Exploration for Model-Based
  Reinforcement Learning
STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning
Souradip Chakraborty
Amrit Singh Bedi
Alec Koppel
Mengdi Wang
Furong Huang
Dinesh Manocha
24
7
0
28 Jan 2023
Anytime-valid off-policy inference for contextual bandits
Anytime-valid off-policy inference for contextual bandits
Ian Waudby-Smith
Lili Wu
Aaditya Ramdas
Nikos Karampatziakis
Paul Mineiro
OffRL
43
25
0
19 Oct 2022
Adversarially Robust Multi-Armed Bandit Algorithm with
  Variance-Dependent Regret Bounds
Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds
Shinji Ito
Taira Tsuchiya
Junya Honda
AAML
23
16
0
14 Jun 2022
Minimax Regret for Partial Monitoring: Infinite Outcomes and
  Rustichini's Regret
Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret
Tor Lattimore
22
16
0
22 Feb 2022
Gaussian Imagination in Bandit Learning
Gaussian Imagination in Bandit Learning
Yueyang Liu
Adithya M. Devraj
Benjamin Van Roy
Kuang Xu
40
7
0
06 Jan 2022
The Value of Information When Deciding What to Learn
The Value of Information When Deciding What to Learn
Dilip Arumugam
Benjamin Van Roy
37
12
0
26 Oct 2021
Reinforcement Learning, Bit by Bit
Reinforcement Learning, Bit by Bit
Xiuyuan Lu
Benjamin Van Roy
Vikranth Dwaracherla
M. Ibrahimi
Ian Osband
Zheng Wen
30
70
0
06 Mar 2021
Exploration by Optimisation in Partial Monitoring
Exploration by Optimisation in Partial Monitoring
Tor Lattimore
Csaba Szepesvári
33
38
0
12 Jul 2019
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits
Julian Zimmert
Yevgeny Seldin
AAML
24
175
0
19 Jul 2018
1