58
0

Private Geometric Median in Nearly-Linear Time

Main:21 Pages
5 Figures
Bibliography:3 Pages
1 Tables
Appendix:1 Pages
Abstract

Estimating the geometric median of a dataset is a robust counterpart to mean estimation, and is a fundamental problem in computational geometry. Recently, [HSU24] gave an (ε,δ)(\varepsilon, \delta)-differentially private algorithm obtaining an α\alpha-multiplicative approximation to the geometric median objective, 1ni[n]xi\frac 1 n \sum_{i \in [n]} \|\cdot - \mathbf{x}_i\|, given a dataset D:={xi}i[n]Rd\mathcal{D} := \{\mathbf{x}_i\}_{i \in [n]} \subset \mathbb{R}^d. Their algorithm requires nd1αεn \gtrsim \sqrt d \cdot \frac 1 {\alpha\varepsilon} samples, which they prove is information-theoretically optimal. This result is surprising because its error scales with the \emph{effective radius} of D\mathcal{D} (i.e., of a ball capturing most points), rather than the worst-case radius. We give an improved algorithm that obtains the same approximation quality, also using nd1αϵn \gtrsim \sqrt d \cdot \frac 1 {\alpha\epsilon} samples, but in time O~(nd+dα2)\widetilde{O}(nd + \frac d {\alpha^2}). Our runtime is nearly-linear, plus the cost of the cheapest non-private first-order method due to [CLM+16]. To achieve our results, we use subsampling and geometric aggregation tools inspired by FriendlyCore [TCK+22] to speed up the "warm start" component of the [HSU24] algorithm, combined with a careful custom analysis of DP-SGD's sensitivity for the geometric median objective.

View on arXiv
Comments on this paper