ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1608.04414
45
69
v1v2v3 (latest)

Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back

15 August 2016
Vitaly Feldman
ArXiv (abs)PDFHTML
Abstract

In stochastic convex optimization the goal is to minimize a convex function F(x)≐Ef∼D[f(x)]F(x) \doteq {\mathbf E}_{{\mathbf f}\sim D}[{\mathbf f}(x)]F(x)≐Ef∼D​[f(x)] over a convex set K⊂Rd\cal K \subset {\mathbb R}^dK⊂Rd where DDD is some unknown distribution and each f(⋅)f(\cdot)f(⋅) in the support of DDD is convex over K\cal KK. The optimization is commonly based on i.i.d.~samples f1,f2,…,fnf^1,f^2,\ldots,f^nf1,f2,…,fn from DDD. A standard approach to such problems is empirical risk minimization (ERM) that optimizes FS(x)≐1n∑i≤nfi(x)F_S(x) \doteq \frac{1}{n}\sum_{i\leq n} f^i(x)FS​(x)≐n1​∑i≤n​fi(x). Here we consider the question of how many samples are necessary for ERM to succeed and the closely related question of uniform convergence of FSF_SFS​ to FFF over K\cal KK. We demonstrate that in the standard ℓp/ℓq\ell_p/\ell_qℓp​/ℓq​ setting of Lipschitz-bounded functions over a K\cal KK of bounded radius, ERM requires sample size that scales linearly with the dimension ddd. This nearly matches standard upper bounds and improves on Ω(log⁡d)\Omega(\log d)Ω(logd) dependence proved for ℓ2/ℓ2\ell_2/\ell_2ℓ2​/ℓ2​ setting by Shalev-Shwartz et al. (2009). In stark contrast, these problems can be solved using dimension-independent number of samples for ℓ2/ℓ2\ell_2/\ell_2ℓ2​/ℓ2​ setting and log⁡d\log dlogd dependence for ℓ1/ℓ∞\ell_1/\ell_\inftyℓ1​/ℓ∞​ setting using other approaches. We further show that our lower bound applies even if the functions in the support of DDD are smooth and efficiently computable and even if an ℓ1\ell_1ℓ1​ regularization term is added. Finally, we demonstrate that for a more general class of bounded-range (but not Lipschitz-bounded) stochastic convex programs an infinite gap appears already in dimension 2.

View on arXiv
Comments on this paper