
ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Papers citing "ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise"
28 / 28 papers shown
Title |
---|
![]() Large Batch Optimization for Deep Learning: Training BERT in 76 minutes Yang You Jing Li Sashank J. Reddi Jonathan Hseu Sanjiv Kumar Srinadh Bhojanapalli Xiaodan Song J. Demmel Kurt Keutzer Cho-Jui Hsieh |