Beyond Worst-Case Dimensionality Reduction for Sparse Vectors

27 February 2025

Abstract

We study beyond worst-case dimensionality reduction for $s$ -sparse vectors. Our work is divided into two parts, each focusing on a different facet of beyond worst-case analysis:We first consider average-case guarantees. A folklore upper bound based on the birthday-paradox states: For any collection $X$ of $s$ -sparse vectors in $\mathbb{R}^d$ , there exists a linear map to $\mathbb{R}^{O(s^2)}$ which \emph{exactly} preserves the norm of $99\%$ of the vectors in $X$ in any $\ell_p$ norm (as opposed to the usual setting where guarantees hold for all vectors). We give lower bounds showing that this is indeed optimal in many settings: any oblivious linear map satisfying similar average-case guarantees must map to $\Omega(s^2)$ dimensions. The same lower bound also holds for a wide class of smooth maps, including `encoder-decoder schemes', where we compare the norm of the original vector to that of a smooth function of the embedding. These lower bounds reveal a separation result, as an upper bound of $O(s \log(d))$ is possible if we instead use arbitrary (possibly non-smooth) functions, e.g., via compressed sensing algorithms.Given these lower bounds, we specialize to sparse \emph{non-negative} vectors. For a dataset $X$ of non-negative $s$ -sparse vectors and any $p \ge 1$ , we can non-linearly embed $X$ to $O(s\log(|X|s)/\epsilon^2)$ dimensions while preserving all pairwise distances in $\ell_p$ norm up to $1\pm \epsilon$ , with no dependence on $p$ . Surprisingly, the non-negativity assumption enables much smaller embeddings than arbitrary sparse vectors, where the best known bounds suffer exponential dependence. Our map also guarantees \emph{exact} dimensionality reduction for $\ell_{\infty}$ by embedding into $O(s\log |X|)$ dimensions, which is tight. We show that both the non-linearity of $f$ and the non-negativity of $X$ are necessary, and provide downstream algorithmic improvements.

View on arXiv

@article{silwal2025_2502.19865,
  title={ Beyond Worst-Case Dimensionality Reduction for Sparse Vectors },
  author={ Sandeep Silwal and David P. Woodruff and Qiuyi Zhang },
  journal={arXiv preprint arXiv:2502.19865},
  year={ 2025 }
}

Comments on this paper