Coupling Dimensionality Reduction with Generative Model for Non-Interactive Private Data Release

31 August 2017

Abstract

A key challenge facing the design of differentially private systems in the non-interactive setting is to maintain the utility of the released data for general analytics applications. To overcome this challenge, we propose the PCA-Gauss system that leverages the novel combination of dimensionality reduction and generative model for synthesizing differentially private data. We present multiple algorithms that combine dimensionality reduction and generative models for a range of machine learning applications, including both unsupervised and supervised learning. Our key theoretical results prove that (a) our algorithms satisfy the strong epsilon-differential privacy guarantee, and (b) dimensionality reduction can quadratically lower the level of perturbation required for differential privacy, with minimal cost of information loss. Finally, we illustrate the effectiveness of PCA-Gauss under three common machine learning applications -- clustering, classification, and regression -- on three large real-world datasets. Our empirical results show that (a) PCA-Gauss outperforms previous approaches by an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, PCA-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

View on arXiv

Comments on this paper