We consider a contextual online learning (multi-armed bandit) problem with high-dimensional covariate and decision . The reward function to learn, , does not have a particular parametric form. The literature has shown that the optimal regret is , where and are the dimensions of and , and thus it suffers from the curse of dimensionality. In many applications, only a small subset of variables in the covariate affect the value of , which is referred to as \textit{sparsity} in statistics. To take advantage of the sparsity structure of the covariate, we propose a variable selection algorithm called \textit{BV-LASSO}, which incorporates novel ideas such as binning and voting to apply LASSO to nonparametric settings. Our algorithm achieves the regret , where is the effective covariate dimension. The regret matches the optimal regret when the covariate is -dimensional and thus cannot be improved. Our algorithm may serve as a general recipe to achieve dimension reduction via variable selection in nonparametric settings.
View on arXiv