15
2

A Dimension-free Algorithm for Contextual Continuum-armed Bandits

Abstract

In contextual continuum-armed bandits, the contexts xx and the arms yy are both continuous and drawn from high-dimensional spaces. The payoff function to learn f(x,y)f(x,y) does not have a particular parametric form. The literature has shown that for Lipschitz-continuous functions, the optimal regret is O~(Tdx+dy+1dx+dy+2)\tilde{O}(T^{\frac{d_x+d_y+1}{d_x+d_y+2}}), where dxd_x and dyd_y are the dimensions of contexts and arms, and thus suffers from the curse of dimensionality. We develop an algorithm that achieves regret O~(Tdx+1dx+2)\tilde{O}(T^{\frac{d_x+1}{d_x+2}}) when ff is globally concave in yy. The global concavity is a common assumption in many applications. The algorithm is based on stochastic approximation and estimates the gradient information in an online fashion. Our results generate a valuable insight that the curse of dimensionality of the arms can be overcome with some mild structures of the payoff function.

View on arXiv
Comments on this paper