Scaling Private Record Linkage using Output Constrained Differential Privacy

Conference on Computer and Communications Security (CCS), 2017

2 February 2017

Xi He

Ashwin Machanavajjhala

Cheryl J. Flynn

D. Srivastava

ArXiv (abs)PDF HTML

Abstract

Many scenarios require computing the join of databases held by two or more parties that do not trust one another. Private record linkage is a cryptographic tool that allows such a join to be computed without leaking any information about records that do not participate in the join output. However, such strong security comes with a cost: except for exact equi-joins, these techniques have a high computational cost. While blocking has been used to successfully scale non-private record linkage trading off efficiency with output accuracy, we show that prior techniques that use blocking, even based on differentially private algorithms, result in leakage of non-matching records. In this paper, we seek methods for speeding up private record linkage with strong provable privacy guarantees. We propose a novel privacy model, called output constrained differential privacy, that shares the strong privacy protection of differential privacy, but allows for the truthful release of the output of a certain function on the data. We apply this to private record linkage, and show that algorithms satisfying this privacy model permit the disclosure of the true matching records, but their execution is insensitive to the presence or absence of a single non-matching record. We show that prior attempts to speed up record linkage do not even satisfy our privacy model. We develop novel algorithms for private record linkage that satisfy our privacy model, and explore the resulting 3-way tradeoff between privacy, efficiency and output accuracy using experiments on real datasets.

View on arXiv

Comments on this paper