Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design

4 July 2020

Abstract

Motivated by practical needs such as large-scale learning, we study the impact of adaptivity constraints to linear contextual bandits, a central problem in online active learning. We consider two popular limited adaptivity models in literature: batch learning and rare policy switches. We show that, when the context vectors are adversarially chosen in $d$ -dimensional linear contextual bandits, the learner needs $\Omega(d \log T/ \log (d \log T))$ policy switches to achieve the minimax-optimal expected regret, almost matching the $O(d \log T)$ upper bound by Abbasi-Yadkori et al. [2011]; for stochastic context vectors, even in the more restricted batch learning model, only $O(\log \log T)$ batches are needed to achieve the optimal regret. Together with the known results in literature, our results present a complete picture about the adaptivity constraints in linear contextual bandits. Along the way, we propose \emph{distributional optimal design}, a natural extension of the optimal experiment design, and provide a sample-efficient learning algorithm for the problem, which may be of independent interest.

View on arXiv

Comments on this paper