44
98

Randomized Exploration in Generalized Linear Bandits

Abstract

We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. GLM-FPL fits a GLM to a randomly perturbed history of past rewards. We prove O~(dnlogK)\tilde{O}(d \sqrt{n \log K}) bounds on the nn-round regret of GLM-TSL and GLM-FPL, where dd is the number of features and KK is the number of arms. The regret bound of GLM-TSL improves upon prior work and the regret bound of GLM-FPL is the first of its kind. We apply both GLM-TSL and GLM-FPL to logistic and neural network bandits, and show that they perform well empirically. In more complex models, GLM-FPL is significantly faster. Our results showcase the role of randomization, beyond sampling from the posterior, in exploration.

View on arXiv
Comments on this paper