22
8

Algorithmically Effective Differentially Private Synthetic Data

Yi He
Roman Vershynin
Yizhe Zhu
Abstract

We present a highly effective algorithmic approach for generating ε\varepsilon-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset XX in the hypercube [0,1]d[0,1]^d, our algorithm generates synthetic dataset YY such that the expected 1-Wasserstein distance between the empirical measure of XX and YY is O((εn)1/d)O((\varepsilon n)^{-1/d}) for d2d\geq 2, and is O(log2(εn)(εn)1)O(\log^2(\varepsilon n)(\varepsilon n)^{-1}) for d=1d=1. The accuracy guarantee is optimal up to a constant factor for d2d\geq 2, and up to a logarithmic factor for d=1d=1. Our algorithm has a fast running time of O(εdn)O(\varepsilon dn) for all d1d\geq 1 and demonstrates improved accuracy compared to the method in (Boedihardjo et al., 2022) for d2d\geq 2.

View on arXiv
Comments on this paper