Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.06173
Cited By
SWIFT: Expedited Failure Recovery for Large-scale DNN Training
13 February 2023
Keon Jang
Hassan M. G. Wassel
Behnam Montazeri
Michael Ryan
David Wetherall
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SWIFT: Expedited Failure Recovery for Large-scale DNN Training"
2 / 2 papers shown
Title
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,198
0
01 Sep 2014
1