On the Tradeoff between Privacy and Distortion in Differential Privacy

16 February 2014

Abstract

In this paper, we consider the setting in which the output of a differentially private mechanism is in the same universe as the input, and investigate the usefulness in terms of (the negative of) the distortion between the output and the input. This setting can be regarded as the synthetic database release problem. We define a privacy-distortion function $\epsilon^*(D)$ , which is the smallest (best) achievable differential privacy level given a distortion upper bound $D$ , and quantify the fundamental privacy-distortion tradeoff by characterizing $\epsilon^*$ . Specifically, we first obtain an upper bound on $\epsilon^*$ by designing a mechanism $\mathcal{E}$ . Then we derive a lower bound on $\epsilon^*$ that deviates from the upper bound only by a constant. It turns out that $\mathcal{E}$ is an optimal mechanism when the database is drawn uniformly from the universe, i.e., the upper bound and the lower bound meet. A significant advantage of mechanism $\mathcal{E}$ is that its distortion guarantee does not depend on the prior and its implementation is computationally efficient, although it may not be optimal always. From a learning perspective, we further introduce a new notion of differential privacy that is defined on the posterior probabilities, which we call a posteriori differential privacy. Under this notion, the exact form of the privacy-distortion function is obtained for a wide range of distortion values. We then establish a fundamental connection between the privacy-distortion tradeoff and the information-theoretic rate-distortion theory. An interesting finding is that there exists a consistency between the rate-distortion and the privacy-distortion under a posteriori differential privacy, which is shown by devising a mechanism that minimizes the mutual information and the privacy level simultaneously.

View on arXiv

Comments on this paper