
Effective Theory of Transformers at Initialization
Papers citing "Effective Theory of Transformers at Initialization"
45 / 45 papers shown
Title |
---|
![]() Mixed Precision Training Paulius Micikevicius Sharan Narang Jonah Alben G. Diamos Erich Elsen ...Boris Ginsburg Michael Houston Oleksii Kuchaiev Ganesh Venkatesh Hao Wu |