42
3
v1v2 (latest)

Minimax Regret for Bandit Convex Optimisation of Ridge Functions

Abstract

We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form ft(x)=gt(x,θ)f_t(x) = g_t(\langle x, \theta\rangle) for convex gt:RRg_t : \mathbb R \to \mathbb R and unknown θRd\theta \in \mathbb R^d that is homogeneous over time. We provide a short information-theoretic proof that the minimax regret is at most O(dnlog(ndiam(K)))O(d \sqrt{n} \log(n \operatorname{diam}(\mathcal K))) where nn is the number of interactions, dd the dimension and diam(K)\operatorname{diam}(\mathcal K) is the diameter of the constraint set.

View on arXiv
Comments on this paper