Estimation Efficiency Under Privacy Constraints

8 July 2017

Abstract

We investigate the problem of estimating a random variable $Y\in \mathcal{Y}$ under a privacy constraint dictated by another random variable $X\in \mathcal{X}$ , where estimation efficiency and privacy are assessed in terms of two different loss functions. In the discrete case, we use the Hamming loss function and express the corresponding utility-privacy tradeoff in terms of the privacy-constrained guessing probability $h(P_{XY}, \epsilon)$ , the maximum probability $\mathsf{P}_\mathsf{c}(Y|Z)$ of correctly guessing $Y$ given an auxiliary random variable $Z\in \mathcal{Z}$ , where the maximization is taken over all $P_{Z|Y}$ ensuring that $\mathsf{P}_\mathsf{c}(X|Z)\leq \epsilon$ for a given privacy threshold $\epsilon \geq 0$ . We prove that $h(P_{XY}, \cdot)$ is concave and piecewise linear, which allows us to derive its expression in closed form for any $\epsilon$ when $X$ and $Y$ are binary. In the non-binary case, we derive $h(P_{XY}, \epsilon)$ in the high utility regime (i.e., for sufficiently large values of $\epsilon$ ) under the assumption that $Z$ takes values in $\mathcal{Y}$ . We also analyze the privacy-constrained guessing probability for two binary vector scenarios. When $X$ and $Y$ are continuous random variables, we use the squared-error loss function and express the corresponding utility-privacy tradeoff in terms of $\mathsf{sENSR}(P_{XY}, \epsilon)$ , which is the smallest normalized minimum mean squared-error (mmse) incurred in estimating $Y$ from its Gaussian perturbation $Z$ , such that the mmse of $f(X)$ given $Z$ is within $\epsilon$ of the variance of $f(X)$ for any non-constant real-valued function $f$ . We derive tight upper and lower bounds for $\mathsf{sENSR}$ when $Y$ is Gaussian. We also obtain a tight lower bound for $\mathsf{sENSR}(P_{XY}, \epsilon)$ for general absolutely continuous random variables when $\epsilon$ is sufficiently small.

View on arXiv

Comments on this paper