Gradual Release of Sensitive Data under Differential Privacy

2 April 2015

George J. Pappas

Abstract

We introduce the problem of releasing sensitive data under differential privacy when the privacy level is subject to change over time. Existing work assumes that privacy level is determined by the system designer as a fixed value before sensitive data is released. For certain applications, however, users may wish to relax the privacy level for subsequent releases of the same data after either a re-evaluation of the privacy concerns or the need for better accuracy. Specifically, given a database containing sensitive data, we assume that a response $y_1$ that preserves $\epsilon_{1}$ -differential privacy has already been published. Then, the privacy level is relaxed to $\epsilon_2$ , with $\epsilon_2 > \epsilon_1$ , and we wish to publish a more accurate response $y_2$ while the joint response $(y_1, y_2)$ preserves $\epsilon_2$ -differential privacy. How much accuracy is lost in the scenario of gradually releasing two responses $y_1$ and $y_2$ compared to the scenario of releasing a single response that is $\epsilon_{2}$ -differentially private? Our results show that there exists a composite mechanism that achieves \textit{no loss} in accuracy. We consider the case in which the private data lies within $\mathbb{R}^{n}$ with an adjacency relation induced by the $\ell_{1}$ -norm, and we focus on mechanisms that approximate identity queries. We show that the same accuracy can be achieved in the case of gradual release through a mechanism whose outputs can be described by a \textit{lazy Markov stochastic process}. This stochastic process has a closed form expression and can be efficiently sampled. Our results are applicable beyond identity queries. To this end, we demonstrate that our results can be applied in several cases, including Google's RAPPOR project, trading of sensitive data, and controlled transmission of private data in a social network.

View on arXiv

Comments on this paper