ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.01624
22
131

Large-Scale Differentially Private BERT

3 August 2021
Rohan Anil
Badih Ghazi
Vineet Gupta
Ravi Kumar
Pasin Manurangsi
ArXivPDFHTML
Abstract

In this work, we study the large-scale pretraining of BERT-Large with differentially private SGD (DP-SGD). We show that combined with a careful implementation, scaling up the batch size to millions (i.e., mega-batches) improves the utility of the DP-SGD step for BERT; we also enhance its efficiency by using an increasing batch size schedule. Our implementation builds on the recent work of [SVK20], who demonstrated that the overhead of a DP-SGD step is minimized with effective use of JAX [BFH+18, FJL18] primitives in conjunction with the XLA compiler [XLA17]. Our implementation achieves a masked language model accuracy of 60.5% at a batch size of 2M, for ϵ=5.36\epsilon = 5.36ϵ=5.36. To put this number in perspective, non-private BERT models achieve an accuracy of ∼\sim∼70%.

View on arXiv
Comments on this paper