What Can We Learn Privately?

6 March 2008

S. Kasiviswanathan

Abstract

Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a recent notion that provides meaningful confidentiality guarantees in the presence of arbitrary side information. We introduce and formulate private learning problems. Our goal is a broad understanding of the resources required for private learning in terms of samples, computation time, and interaction. Along the way we develop novel algorithmic tools and bounds on the sample size required by private learning algorithms. Specifically, we provide: (1) A generic, distribution-free private learning algorithm that uses approximately log|C| samples to learn a concept class C. This is a private analogue of Occam's razor. The generic learner is not always computationally efficient. (2) A computationally efficient, distribution-free private PAC learner for the class of parity functions. (3) A precise characterization of local, or randomized response, private learning algorithms. We show that a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model. (4) A separation between the power of interactive and noninteractive local learning algorithms. Because of the equivalence to SQ learning, this result also separates adaptive and nonadaptive SQ learning.

View on arXiv

Comments on this paper