69
44

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Abstract

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most O(d2.5nlog(n))O(d^{2.5} \sqrt{n} \log(n)), where dd is the dimension and nn is the number of interactions. This improves on O(d9.5nlog(n)7.5O(d^{9.5} \sqrt{n} \log(n)^{7.5} by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

View on arXiv
Comments on this paper