ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.05094
  4. Cited By
Meta-Learning Bandit Policies by Gradient Ascent
v1v2 (latest)

Meta-Learning Bandit Policies by Gradient Ascent

9 June 2020
Branislav Kveton
Martin Mladenov
Chih-Wei Hsu
Manzil Zaheer
Csaba Szepesvári
Craig Boutilier
ArXiv (abs)PDFHTML

Papers citing "Meta-Learning Bandit Policies by Gradient Ascent"

22 / 22 papers shown
Title
Meta-learning with Stochastic Linear Bandits
Meta-learning with Stochastic Linear Bandits
Leonardo Cella
A. Lazaric
Massimiliano Pontil
FedML
52
57
0
18 May 2020
On the Global Convergence Rates of Softmax Policy Gradient Methods
On the Global Convergence Rates of Softmax Policy Gradient Methods
Jincheng Mei
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
135
292
0
13 May 2020
Provable Meta-Learning of Linear Representations
Provable Meta-Learning of Linear Representations
Nilesh Tripuraneni
Chi Jin
Michael I. Jordan
OOD
106
191
0
26 Feb 2020
Differentiable Bandit Exploration
Differentiable Bandit Exploration
Craig Boutilier
Chih-Wei Hsu
Branislav Kveton
Martin Mladenov
Csaba Szepesvári
Manzil Zaheer
BDLOffRL
50
7
0
17 Feb 2020
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
69
321
0
01 Aug 2019
Meta-learning of Sequential Strategies
Meta-learning of Sequential Strategies
Pedro A. Ortega
Jane X. Wang
Mark Rowland
Tim Genewein
Z. Kurth-Nelson
...
Yee Whye Teh
H. V. Hasselt
Nando de Freitas
M. Botvinick
Shane Legg
OffRL
110
99
0
08 May 2019
Empirical Bayes Regret Minimization
Empirical Bayes Regret Minimization
Chih-Wei Hsu
Branislav Kveton
Ofer Meshi
Martin Mladenov
Csaba Szepesvári
58
13
0
04 Apr 2019
Perturbed-History Exploration in Stochastic Linear Bandits
Perturbed-History Exploration in Stochastic Linear Bandits
Branislav Kveton
Csaba Szepesvári
Mohammad Ghavamzadeh
Craig Boutilier
36
42
0
21 Mar 2019
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
Branislav Kveton
Csaba Szepesvári
Sharan Vaswani
Zheng Wen
Mohammad Ghavamzadeh
Tor Lattimore
142
70
0
13 Nov 2018
Probabilistic Model-Agnostic Meta-Learning
Probabilistic Model-Agnostic Meta-Learning
Chelsea Finn
Kelvin Xu
Sergey Levine
BDL
271
671
0
07 Jun 2018
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep
  Networks for Thompson Sampling
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
C. Riquelme
George Tucker
Jasper Snoek
BDL
71
366
0
26 Feb 2018
Multi-Task Learning for Contextual Bandits
Multi-Task Learning for Contextual Bandits
A. Deshmukh
Ürün Dogan
Clayton Scott
126
94
0
24 May 2017
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
823
11,909
0
09 Mar 2017
RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning
RL2^22: Fast Reinforcement Learning via Slow Reinforcement Learning
Yan Duan
John Schulman
Xi Chen
Peter L. Bartlett
Ilya Sutskever
Pieter Abbeel
OffRL
96
1,019
0
09 Nov 2016
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
277
6,776
0
19 Feb 2015
Algorithms for multi-armed bandit problems
Algorithms for multi-armed bandit problems
Volodymyr Kuleshov
Doina Precup
143
350
0
25 Feb 2014
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
Alekh Agarwal
Daniel J. Hsu
Satyen Kale
John Langford
Lihong Li
Robert Schapire
OffRL
394
509
0
04 Feb 2014
Thompson Sampling for Contextual Bandits with Linear Payoffs
Thompson Sampling for Contextual Bandits with Linear Payoffs
Shipra Agrawal
Navin Goyal
195
1,000
0
15 Sep 2012
Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed
  Bandit Case
Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case
Francis Maes
D. Ernst
L. Wehenkel
70
24
0
22 Jul 2012
Infinite-Horizon Policy-Gradient Estimation
Infinite-Horizon Policy-Gradient Estimation
Jonathan Baxter
Peter L. Bartlett
100
811
0
03 Jun 2011
A Model of Inductive Bias Learning
A Model of Inductive Bias Learning
Jonathan Baxter
109
1,214
0
01 Jun 2011
A Contextual-Bandit Approach to Personalized News Article Recommendation
A Contextual-Bandit Approach to Personalized News Article Recommendation
Lihong Li
Wei Chu
John Langford
Robert Schapire
469
2,951
0
28 Feb 2010
1