Formal Privacy for Partially Private Data

3 April 2022

Abstract

Differential privacy (DP) requires that any statistic based on confidential data be released with additional noise for privacy. Such a restriction can be logistically impossible to achieve, for example due to policy-mandated disclosure in the present or unsanitized data releases in the past. Still, we want to preserve DP-style privacy guarantees for future data releases in excess of this pre-existing public information. In this paper, we present a privacy formalism, $\epsilon$ -DP relative to $Z$ , extending Pufferfish privacy, that accommodates DP-style semantics in the presence of public information. We introduce two mechanisms for releasing partially private data (PPD) and prove their desirable properties such as asymptotic negligibility of errors due to privacy and congeniality with as-is public information. We demonstrate theoretically and empirically how statistical inference from PPD degrades with post-processing, and propose alternative inference algorithms for estimating statistics from PPD. This collection of the framework, mechanisms, and inferential tools aims to help practitioners overcome the real logistical barriers introduced when public information is an unavoidable component of the data release process.

View on arXiv

Comments on this paper