Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2003.11666
Cited By
Pipelined Backpropagation at Scale: Training Large Models without Batches
25 March 2020
Atli Kosson
Vitaliy Chiley
Abhinav Venigalla
Joel Hestness
Urs Koster
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pipelined Backpropagation at Scale: Training Large Models without Batches"
9 / 9 papers shown
Title
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
32
20
0
24 Feb 2020
Dissecting the Graphcore IPU Architecture via Microbenchmarking
Zhe Jia
Blake Tillman
Marco Maggioni
D. Scarpazza
26
134
0
07 Dec 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
279
1,861
0
17 Sep 2019
Fully Decoupled Neural Network Learning Using Delayed Gradients
Huiping Zhuang
Yi Wang
Qinglai Liu
Shuai Zhang
Zhiping Lin
FedML
34
30
0
21 Jun 2019
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
63
408
0
08 Nov 2018
Group Normalization
Yuxin Wu
Kaiming He
121
3,626
0
22 Mar 2018
YellowFin and the Art of Momentum Tuning
Jian Zhang
Ioannis Mitliagkas
ODL
37
108
0
12 Jun 2017
Revisiting Distributed Synchronous SGD
Jianmin Chen
Xinghao Pan
R. Monga
Samy Bengio
Rafal Jozefowicz
53
799
0
04 Apr 2016
Identity Mappings in Deep Residual Networks
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
259
10,149
0
16 Mar 2016
1