ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.06120
40
5

Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

8 July 2024
Yijun Dong
Hoang Phan
Xiang Pan
Qi Lei
ArXivPDFHTML
Abstract

We revisit data selection in a modern context of finetuning from a fundamental perspective. Extending the classical wisdom of variance minimization in low dimensions to high-dimensional finetuning, our generalization analysis unveils the importance of additionally reducing bias induced by low-rank approximation. Inspired by the variance-bias tradeoff in high dimensions from the theory, we introduce Sketchy Moment Matching (SkMM), a scalable data selection scheme with two stages. (i) First, the bias is controlled using gradient sketching that explores the finetuning parameter space for an informative low-dimensional subspace S\mathcal{S}S; (ii) then the variance is reduced over S\mathcal{S}S via moment matching between the original and selected datasets. Theoretically, we show that gradient sketching is fast and provably accurate: selecting nnn samples by reducing variance over S\mathcal{S}S preserves the fast-rate generalization O(dim⁡(S)/n)O(\dim(\mathcal{S})/n)O(dim(S)/n), independent of the parameter dimension. Empirically, we concretize the variance-bias balance via synthetic experiments and demonstrate the effectiveness of SkMM for finetuning in real vision tasks.

View on arXiv
Comments on this paper