Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.05094
Cited By
v1
v2 (latest)
Meta-Learning Bandit Policies by Gradient Ascent
9 June 2020
Branislav Kveton
Martin Mladenov
Chih-Wei Hsu
Manzil Zaheer
Csaba Szepesvári
Craig Boutilier
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Meta-Learning Bandit Policies by Gradient Ascent"
22 / 22 papers shown
Title
Meta-learning with Stochastic Linear Bandits
Leonardo Cella
A. Lazaric
Massimiliano Pontil
FedML
52
57
0
18 May 2020
On the Global Convergence Rates of Softmax Policy Gradient Methods
Jincheng Mei
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
135
292
0
13 May 2020
Provable Meta-Learning of Linear Representations
Nilesh Tripuraneni
Chi Jin
Michael I. Jordan
OOD
106
191
0
26 Feb 2020
Differentiable Bandit Exploration
Craig Boutilier
Chih-Wei Hsu
Branislav Kveton
Martin Mladenov
Csaba Szepesvári
Manzil Zaheer
BDL
OffRL
50
7
0
17 Feb 2020
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
69
321
0
01 Aug 2019
Meta-learning of Sequential Strategies
Pedro A. Ortega
Jane X. Wang
Mark Rowland
Tim Genewein
Z. Kurth-Nelson
...
Yee Whye Teh
H. V. Hasselt
Nando de Freitas
M. Botvinick
Shane Legg
OffRL
110
99
0
08 May 2019
Empirical Bayes Regret Minimization
Chih-Wei Hsu
Branislav Kveton
Ofer Meshi
Martin Mladenov
Csaba Szepesvári
58
13
0
04 Apr 2019
Perturbed-History Exploration in Stochastic Linear Bandits
Branislav Kveton
Csaba Szepesvári
Mohammad Ghavamzadeh
Craig Boutilier
36
42
0
21 Mar 2019
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
Branislav Kveton
Csaba Szepesvári
Sharan Vaswani
Zheng Wen
Mohammad Ghavamzadeh
Tor Lattimore
142
70
0
13 Nov 2018
Probabilistic Model-Agnostic Meta-Learning
Chelsea Finn
Kelvin Xu
Sergey Levine
BDL
271
671
0
07 Jun 2018
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
C. Riquelme
George Tucker
Jasper Snoek
BDL
71
366
0
26 Feb 2018
Multi-Task Learning for Contextual Bandits
A. Deshmukh
Ürün Dogan
Clayton Scott
126
94
0
24 May 2017
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
823
11,909
0
09 Mar 2017
RL
2
^2
2
: Fast Reinforcement Learning via Slow Reinforcement Learning
Yan Duan
John Schulman
Xi Chen
Peter L. Bartlett
Ilya Sutskever
Pieter Abbeel
OffRL
96
1,019
0
09 Nov 2016
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
277
6,776
0
19 Feb 2015
Algorithms for multi-armed bandit problems
Volodymyr Kuleshov
Doina Precup
143
350
0
25 Feb 2014
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
Alekh Agarwal
Daniel J. Hsu
Satyen Kale
John Langford
Lihong Li
Robert Schapire
OffRL
394
509
0
04 Feb 2014
Thompson Sampling for Contextual Bandits with Linear Payoffs
Shipra Agrawal
Navin Goyal
195
1,000
0
15 Sep 2012
Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case
Francis Maes
D. Ernst
L. Wehenkel
70
24
0
22 Jul 2012
Infinite-Horizon Policy-Gradient Estimation
Jonathan Baxter
Peter L. Bartlett
100
811
0
03 Jun 2011
A Model of Inductive Bias Learning
Jonathan Baxter
109
1,214
0
01 Jun 2011
A Contextual-Bandit Approach to Personalized News Article Recommendation
Lihong Li
Wei Chu
John Langford
Robert Schapire
469
2,951
0
28 Feb 2010
1