22
1

Cascading Bandit under Differential Privacy

Kun Wang
Jing Dong
Baoxiang Wang
Shuai Li
Shuo Shao
Abstract

This paper studies \emph{differential privacy (DP)} and \emph{local differential privacy (LDP)} in cascading bandits. Under DP, we propose an algorithm which guarantees ϵ\epsilon-indistinguishability and a regret of O((logTϵ)1+ξ)\mathcal{O}((\frac{\log T}{\epsilon})^{1+\xi}) for an arbitrarily small ξ\xi. This is a significant improvement from the previous work of O(log3Tϵ)\mathcal{O}(\frac{\log^3 T}{\epsilon}) regret. Under (ϵ\epsilon,δ\delta)-LDP, we relax the K2K^2 dependence through the tradeoff between privacy budget ϵ\epsilon and error probability δ\delta, and obtain a regret of O(Klog(1/δ)logTϵ2)\mathcal{O}(\frac{K\log (1/\delta) \log T}{\epsilon^2}), where KK is the size of the arm subset. This result holds for both Gaussian mechanism and Laplace mechanism by analyses on the composition. Our results extend to combinatorial semi-bandit. We show respective lower bounds for DP and LDP cascading bandits. Extensive experiments corroborate our theoretic findings.

View on arXiv
Comments on this paper