54
0

Beyond Worst-Case Dimensionality Reduction for Sparse Vectors

Abstract

We study beyond worst-case dimensionality reduction for ss-sparse vectors. Our work is divided into two parts, each focusing on a different facet of beyond worst-case analysis:We first consider average-case guarantees. A folklore upper bound based on the birthday-paradox states: For any collection XX of ss-sparse vectors in Rd\mathbb{R}^d, there exists a linear map to RO(s2)\mathbb{R}^{O(s^2)} which \emph{exactly} preserves the norm of 99%99\% of the vectors in XX in any p\ell_p norm (as opposed to the usual setting where guarantees hold for all vectors). We give lower bounds showing that this is indeed optimal in many settings: any oblivious linear map satisfying similar average-case guarantees must map to Ω(s2)\Omega(s^2) dimensions. The same lower bound also holds for a wide class of smooth maps, including `encoder-decoder schemes', where we compare the norm of the original vector to that of a smooth function of the embedding. These lower bounds reveal a separation result, as an upper bound of O(slog(d))O(s \log(d)) is possible if we instead use arbitrary (possibly non-smooth) functions, e.g., via compressed sensing algorithms.Given these lower bounds, we specialize to sparse \emph{non-negative} vectors. For a dataset XX of non-negative ss-sparse vectors and any p1p \ge 1, we can non-linearly embed XX to O(slog(Xs)/ϵ2)O(s\log(|X|s)/\epsilon^2) dimensions while preserving all pairwise distances in p\ell_p norm up to 1±ϵ1\pm \epsilon, with no dependence on pp. Surprisingly, the non-negativity assumption enables much smaller embeddings than arbitrary sparse vectors, where the best known bounds suffer exponential dependence. Our map also guarantees \emph{exact} dimensionality reduction for \ell_{\infty} by embedding into O(slogX)O(s\log |X|) dimensions, which is tight. We show that both the non-linearity of ff and the non-negativity of XX are necessary, and provide downstream algorithmic improvements.

View on arXiv
@article{silwal2025_2502.19865,
  title={ Beyond Worst-Case Dimensionality Reduction for Sparse Vectors },
  author={ Sandeep Silwal and David P. Woodruff and Qiuyi Zhang },
  journal={arXiv preprint arXiv:2502.19865},
  year={ 2025 }
}
Comments on this paper