ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.03496
  4. Cited By
Making Asynchronous Stochastic Gradient Descent Work for Transformers

Making Asynchronous Stochastic Gradient Descent Work for Transformers

8 June 2019
Alham Fikri Aji
Kenneth Heafield
ArXivPDFHTML

Papers citing "Making Asynchronous Stochastic Gradient Descent Work for Transformers"

2 / 2 papers shown
Title
SWARM Parallelism: Training Large Models Can Be Surprisingly
  Communication-Efficient
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
Taming Momentum in a Distributed Asynchronous Environment
Taming Momentum in a Distributed Asynchronous Environment
Ido Hakimi
Saar Barkai
Moshe Gabel
Assaf Schuster
19
23
0
26 Jul 2019
1