We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal regret bounds. In the full-information setting, our results demonstrate that -differential privacy may be ensured for free -- in particular, the regret bounds scale as . For bandit linear optimization, and as a special case, for non-stochastic multi-armed bandits, the proposed algorithm achieves a regret of , while the previously known best regret bound was .
View on arXiv