Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.03600
Cited By
Measuring the Effects of Data Parallelism on Neural Network Training
8 November 2018
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Measuring the Effects of Data Parallelism on Neural Network Training"
50 / 107 papers shown
Title
Randomness In Neural Network Training: Characterizing The Impact of Tooling
Donglin Zhuang
Xingyao Zhang
Shuaiwen Leon Song
Sara Hooker
25
75
0
22 Jun 2021
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
21
113
0
15 Jun 2021
NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning
Minghan Yang
Dong Xu
Qiwen Cui
Zaiwen Wen
Pengxiang Xu
18
4
0
14 Jun 2021
Federated Learning with Buffered Asynchronous Aggregation
John Nguyen
Kshitiz Malik
Hongyuan Zhan
Ashkan Yousefpour
Michael G. Rabbat
Mani Malek
Dzmitry Huba
FedML
33
288
0
11 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
26
8
0
04 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
28
13
0
01 Jun 2021
How to decay your learning rate
Aitor Lewkowycz
41
24
0
23 Mar 2021
Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload
Johan Kok
Zhi Kang
S. Tan
Feng Cheng
Shixuan Sun
Bingsheng He
24
26
0
23 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
Consensus Control for Decentralized Deep Learning
Lingjing Kong
Tao R. Lin
Anastasia Koloskova
Martin Jaggi
Sebastian U. Stich
19
76
0
09 Feb 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
177
416
0
18 Jan 2021
Learning from History for Byzantine Robust Optimization
Sai Praneeth Karimireddy
Lie He
Martin Jaggi
FedML
AAML
30
173
0
18 Dec 2020
Exploring the limits of Concurrency in ML Training on Google TPUs
Sameer Kumar
James Bradbury
C. Young
Yu Emma Wang
Anselm Levskaya
...
Tao Wang
Tayo Oguntebi
Yazhou Zu
Yuanzhong Xu
Andy Swing
BDL
AIMat
MoE
LRM
19
27
0
07 Nov 2020
Identifying Exoplanets with Deep Learning. IV. Removing Stellar Activity Signals from Radial Velocity Measurements Using Neural Networks
Zoe L. de Beurs
A. Vanderburg
Christopher J. Shallue
X. Dumusque
A. Cameron
...
K. Rice
D. Sasselov
A. Sozzetti
S. Udry
C. Watson
16
15
0
30 Oct 2020
Anti-Distillation: Improving reproducibility of deep networks
G. Shamir
Lorenzo Coviello
42
20
0
19 Oct 2020
FPRaker: A Processing Element For Accelerating Neural Network Training
Omar Mohamed Awad
Mostafa Mahmoud
Isak Edo Vivancos
Ali Hadi Zadeh
Ciaran Bannon
Anand Jayarajan
Gennady Pekhimenko
Andreas Moshovos
25
15
0
15 Oct 2020
Training independent subnetworks for robust prediction
Marton Havasi
Rodolphe Jenatton
Stanislav Fort
Jeremiah Zhe Liu
Jasper Snoek
Balaji Lakshminarayanan
Andrew M. Dai
Dustin Tran
UQCV
OOD
30
208
0
13 Oct 2020
VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware
Andrew Or
Haoyu Zhang
M. Freedman
17
9
0
20 Sep 2020
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization
Neha S. Wadia
Daniel Duckworth
S. Schoenholz
Ethan Dyer
Jascha Narain Sohl-Dickstein
27
13
0
17 Aug 2020
Skyline: Interactive In-Editor Computational Performance Profiling for Deep Neural Network Training
Geoffrey X. Yu
Tovi Grossman
Gennady Pekhimenko
18
17
0
15 Aug 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
27
37
0
09 Jul 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
37
49
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
J. Lee
Tengyu Ma
29
93
0
15 Jun 2020
Directional convergence and alignment in deep learning
Ziwei Ji
Matus Telgarsky
20
163
0
11 Jun 2020
Pipelined Backpropagation at Scale: Training Large Models without Batches
Atli Kosson
Vitaliy Chiley
Abhinav Venigalla
Joel Hestness
Urs Koster
35
33
0
25 Mar 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
19
20
0
24 Feb 2020
The Two Regimes of Deep Network Training
Guillaume Leclerc
A. Madry
19
45
0
24 Feb 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
38
55
0
07 Jan 2020
Exploring Benefits of Transfer Learning in Neural Machine Translation
Tom Kocmi
27
17
0
06 Jan 2020
Energy Efficient Federated Learning Over Wireless Communication Networks
Zhaohui Yang
Mingzhe Chen
Walid Saad
Choong Seon Hong
M. Shikh-Bahaei
30
681
0
06 Nov 2019
Small-GAN: Speeding Up GAN Training Using Core-sets
Samarth Sinha
Hang Zhang
Anirudh Goyal
Yoshua Bengio
Hugo Larochelle
Augustus Odena
GAN
38
72
0
29 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
109
19,493
0
23 Oct 2019
Characterizing Deep Learning Training Workloads on Alibaba-PAI
Mengdi Wang
Chen Meng
Guoping Long
Chuan Wu
Jun Yang
Wei Lin
Yangqing Jia
19
53
0
14 Oct 2019
Hierarchical Federated Learning Across Heterogeneous Cellular Networks
Mehdi Salehi Heydar Abad
Emre Ozfatura
Deniz Gunduz
Ozgur Ercetin
FedML
27
309
0
05 Sep 2019
Encoder-Agnostic Adaptation for Conditional Language Generation
Zachary M. Ziegler
Luke Melas-Kyriazi
Sebastian Gehrmann
Alexander M. Rush
AI4CE
11
57
0
19 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
34
55
0
30 Jul 2019
Faster Neural Network Training with Data Echoing
Dami Choi
Alexandre Passos
Christopher J. Shallue
George E. Dahl
23
48
0
12 Jul 2019
Fast Training of Sparse Graph Neural Networks on Dense Hardware
Matej Balog
B. V. Merrienboer
Subhodeep Moitra
Yujia Li
Daniel Tarlow
GNN
36
10
0
27 Jun 2019
Associated Learning: Decomposing End-to-end Backpropagation based on Auto-encoders and Target Propagation
Yu-Wei Kao
Hung-Hsuan Chen
BDL
15
5
0
13 Jun 2019
Reducing the variance in online optimization by transporting past gradients
Sébastien M. R. Arnold
Pierre-Antoine Manzagol
Reza Babanezhad
Ioannis Mitliagkas
Nicolas Le Roux
24
28
0
08 Jun 2019
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
Thijs Vogels
Sai Praneeth Karimireddy
Martin Jaggi
19
317
0
31 May 2019
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping
Wu Dong
Murat Keçeli
Rafael Vescovi
Hanyu Li
Corey Adams
...
T. Uram
V. Vishwanath
N. Ferrier
B. Kasthuri
P. Littlewood
FedML
AI4CE
19
9
0
13 May 2019
The Scientific Method in the Science of Machine Learning
Jessica Zosa Forde
Michela Paganini
24
35
0
24 Apr 2019
Reducing Noise in GAN Training with Variance Reduced Extragradient
Tatjana Chavdarova
Gauthier Gidel
F. Fleuret
Simon Lacoste-Julien
25
134
0
18 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
28
980
0
01 Apr 2019
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism
Nikoli Dryden
N. Maruyama
Tom Benson
Tim Moon
M. Snir
B. Van Essen
26
49
0
15 Mar 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
ODL
20
22
0
21 Feb 2019
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
30
72
0
27 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Luo Mai
Paolo Costa
Peter R. Pietzuch
11
69
0
08 Jan 2019
Distilling Information from a Flood: A Possibility for the Use of Meta-Analysis and Systematic Review in Machine Learning Research
Peter Henderson
Emma Brunskill
AI4CE
29
3
0
03 Dec 2018
Previous
1
2
3
Next