Stochastic convex optimization with bandit feedback

Abstract
This paper addresses the problem of minimizing a convex, Lipschitz function over a convex, compact set under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value at any query point . The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs regret. Since any algorithm has regret at least on this problem, our algorithm is optimal in terms of the scaling with .
View on arXivComments on this paper