Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Abstract
We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most , where is the dimension and is the number of interactions. This improves on by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.
View on arXivComments on this paper