20
0

Differentially Private Sparse Linear Regression with Heavy-tailed Responses

Main:14 Pages
2 Figures
Bibliography:4 Pages
2 Tables
Appendix:13 Pages
Abstract

As a fundamental problem in machine learning and differential privacy (DP), DP linear regression has been extensively studied. However, most existing methods focus primarily on either regular data distributions or low-dimensional cases with irregular data. To address these limitations, this paper provides a comprehensive study of DP sparse linear regression with heavy-tailed responses in high-dimensional settings. In the first part, we introduce the DP-IHT-H method, which leverages the Huber loss and private iterative hard thresholding to achieve an estimation error bound of \(\tilde{O}\biggl(s^{* \frac{1 }{2}}\cdot \biggl(\frac{\log d}{n}\biggr)^{\frac{\zeta}{1 + \zeta}}+s^{* \frac{1 + 2\zeta}{2 + 2\zeta}}\cdot \biggl(\frac{\log^2 d}{n \varepsilon}\biggr)^{\frac{\zeta}{1 + \zeta}}\biggr) \) under the (ε,δ)(\varepsilon, \delta)-DP model, where nn is the sample size, dd is the dimensionality, ss^* is the sparsity of the parameter, and ζ(0,1]\zeta \in (0, 1] characterizes the tail heaviness of the data. In the second part, we propose DP-IHT-L, which further improves the error bound under additional assumptions on the response and achieves \(\tilde{O}\Bigl(\frac{(s^*)^{3/2} \log d}{n \varepsilon}\Bigr). \) Compared to the first result, this bound is independent of the tail parameter ζ\zeta. Finally, through experiments on synthetic and real-world datasets, we demonstrate that our methods outperform standard DP algorithms designed for ``regular'' data.

View on arXiv
@article{tian2025_2506.06861,
  title={ Differentially Private Sparse Linear Regression with Heavy-tailed Responses },
  author={ Xizhi Tian and Meng Ding and Touming Tao and Zihang Xiang and Di Wang },
  journal={arXiv preprint arXiv:2506.06861},
  year={ 2025 }
}
Comments on this paper