
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Minjia Zhang
Yuxiong He
Papers citing "Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping"
39 / 39 papers shown
Title |
---|
![]() Mixed Precision Training Paulius Micikevicius Sharan Narang Jonah Alben G. Diamos Erich Elsen ...Boris Ginsburg Michael Houston Oleksii Kuchaiev Ganesh Venkatesh Hao Wu |