19
44

Linear Queries Estimation with Local Differential Privacy

Abstract

We study the problem of estimating a set of dd linear queries with respect to some unknown distribution p\mathbf{p} over a domain J=[J]\mathcal{J}=[J] based on a sensitive data set of nn individuals under the constraint of local differential privacy. This problem subsumes a wide range of estimation tasks, e.g., distribution estimation and dd-dimensional mean estimation. We provide new algorithms for both the offline (non-adaptive) and adaptive versions of this problem. In the offline setting, the set of queries are fixed before the algorithm starts. In the regime where nd2/log(J)n\lesssim d^2/\log(J), our algorithms attain L2L_2 estimation error that is independent of dd, and is tight up to a factor of O~(log1/4(J))\tilde{O}\left(\log^{1/4}(J)\right). For the special case of distribution estimation, we show that projecting the output estimate of an algorithm due to [Acharya et al. 2018] on the probability simplex yields an L2L_2 error that depends only sub-logarithmically on JJ in the regime where nJ2/log(J)n\lesssim J^2/\log(J). These results show the possibility of accurate estimation of linear queries in the high-dimensional settings under the L2L_2 error criterion. In the adaptive setting, the queries are generated over dd rounds; one query at a time. In each round, a query can be chosen adaptively based on all the history of previous queries and answers. We give an algorithm for this problem with optimal LL_{\infty} estimation error (worst error in the estimated values for the queries w.r.t. the data distribution). Our bound matches a lower bound on the LL_{\infty} error for the offline version of this problem [Duchi et al. 2013].

View on arXiv
Comments on this paper