v1v2 (latest)

Meta-Learning Bandit Policies by Gradient Ascent

9 June 2020

Papers citing "Meta-Learning Bandit Policies by Gradient Ascent"

22 / 22 papers shown

Title
Meta-learning with Stochastic Linear Bandits Leonardo Cella A. Lazaric Massimiliano Pontil FedML 52 57 0 18 May 2020
On the Global Convergence Rates of Softmax Policy Gradient Methods Jincheng Mei Chenjun Xiao Csaba Szepesvári Dale Schuurmans 135 292 0 13 May 2020
Provable Meta-Learning of Linear Representations Nilesh Tripuraneni Chi Jin Michael I. Jordan OOD 106 191 0 26 Feb 2020
Differentiable Bandit Exploration Craig Boutilier Chih-Wei Hsu Branislav Kveton Martin Mladenov Csaba Szepesvári Manzil Zaheer BDL OffRL 50 7 0 17 Feb 2020
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift Alekh Agarwal Sham Kakade Jason D. Lee G. Mahajan 69 321 0 01 Aug 2019
Meta-learning of Sequential Strategies Pedro A. Ortega Jane X. Wang Mark Rowland Tim Genewein Z. Kurth-Nelson ... Yee Whye Teh H. V. Hasselt Nando de Freitas M. Botvinick Shane Legg OffRL 110 99 0 08 May 2019
Empirical Bayes Regret Minimization Chih-Wei Hsu Branislav Kveton Ofer Meshi Martin Mladenov Csaba Szepesvári 58 13 0 04 Apr 2019
Perturbed-History Exploration in Stochastic Linear Bandits Branislav Kveton Csaba Szepesvári Mohammad Ghavamzadeh Craig Boutilier 36 42 0 21 Mar 2019
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits Branislav Kveton Csaba Szepesvári Sharan Vaswani Zheng Wen Mohammad Ghavamzadeh Tor Lattimore 142 70 0 13 Nov 2018
Probabilistic Model-Agnostic Meta-Learning Chelsea Finn Kelvin Xu Sergey Levine BDL 271 671 0 07 Jun 2018
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling C. Riquelme George Tucker Jasper Snoek BDL 71 366 0 26 Feb 2018
Multi-Task Learning for Contextual Bandits A. Deshmukh Ürün Dogan Clayton Scott 126 94 0 24 May 2017
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks Chelsea Finn Pieter Abbeel Sergey Levine OOD 823 11,909 0 09 Mar 2017
RL $^2$ : Fast Reinforcement Learning via Slow Reinforcement Learning Yan Duan John Schulman Xi Chen Peter L. Bartlett Ilya Sutskever Pieter Abbeel OffRL 96 1,019 0 09 Nov 2016
Trust Region Policy Optimization John Schulman Sergey Levine Philipp Moritz Michael I. Jordan Pieter Abbeel 277 6,776 0 19 Feb 2015
Algorithms for multi-armed bandit problems Volodymyr Kuleshov Doina Precup 143 350 0 25 Feb 2014
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Alekh Agarwal Daniel J. Hsu Satyen Kale John Langford Lihong Li Robert Schapire OffRL 394 509 0 04 Feb 2014
Thompson Sampling for Contextual Bandits with Linear Payoffs Shipra Agrawal Navin Goyal 195 1,000 0 15 Sep 2012
Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case Francis Maes D. Ernst L. Wehenkel 70 24 0 22 Jul 2012
Infinite-Horizon Policy-Gradient Estimation Jonathan Baxter Peter L. Bartlett 100 811 0 03 Jun 2011
A Model of Inductive Bias Learning Jonathan Baxter 109 1,214 0 01 Jun 2011
A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong Li Wei Chu John Langford Robert Schapire 469 2,951 0 28 Feb 2010