ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.10867
20
5

Online AUC Optimization for Sparse High-Dimensional Datasets

23 September 2020
Baojian Zhou
Yiming Ying
Steven Skiena
ArXivPDFHTML
Abstract

The Area Under the ROC Curve (AUC) is a widely used performance measure for imbalanced classification arising from many application domains where high-dimensional sparse data is abundant. In such cases, each ddd dimensional sample has only kkk non-zero features with k≪dk \ll dk≪d, and data arrives sequentially in a streaming form. Current online AUC optimization algorithms have high per-iteration cost O(d)\mathcal{O}(d)O(d) and usually produce non-sparse solutions in general, and hence are not suitable for handling the data challenge mentioned above. In this paper, we aim to directly optimize the AUC score for high-dimensional sparse datasets under online learning setting and propose a new algorithm, \textsc{FTRL-AUC}. Our proposed algorithm can process data in an online fashion with a much cheaper per-iteration cost O(k)\mathcal{O}(k)O(k), making it amenable for high-dimensional sparse streaming data analysis. Our new algorithmic design critically depends on a novel reformulation of the U-statistics AUC objective function as the empirical saddle point reformulation, and the innovative introduction of the "lazy update" rule so that the per-iteration complexity is dramatically reduced from O(d)\mathcal{O}(d)O(d) to O(k)\mathcal{O}(k)O(k). Furthermore, \textsc{FTRL-AUC} can inherently capture sparsity more effectively by applying a generalized Follow-The-Regularized-Leader (FTRL) framework. Experiments on real-world datasets demonstrate that \textsc{FTRL-AUC} significantly improves both run time and model sparsity while achieving competitive AUC scores compared with the state-of-the-art methods. Comparison with the online learning method for logistic loss demonstrates that \textsc{FTRL-AUC} achieves higher AUC scores especially when datasets are imbalanced.

View on arXiv
Comments on this paper