Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.03496
Cited By
Making Asynchronous Stochastic Gradient Descent Work for Transformers
8 June 2019
Alham Fikri Aji
Kenneth Heafield
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Making Asynchronous Stochastic Gradient Descent Work for Transformers"
2 / 2 papers shown
Title
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
Taming Momentum in a Distributed Asynchronous Environment
Ido Hakimi
Saar Barkai
Moshe Gabel
Assaf Schuster
16
23
0
26 Jul 2019
1