Pipelined Backpropagation at Scale: Training Large Models without Batches

25 March 2020

Papers citing "Pipelined Backpropagation at Scale: Training Large Models without Batches"

9 / 9 papers shown

Title
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks Soham De Samuel L. Smith ODL 32 20 0 24 Feb 2020
Dissecting the Graphcore IPU Architecture via Microbenchmarking Zhe Jia Blake Tillman Marco Maggioni D. Scarpazza 26 134 0 07 Dec 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Mohammad Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 279 1,861 0 17 Sep 2019
Fully Decoupled Neural Network Learning Using Delayed Gradients Huiping Zhuang Yi Wang Qinglai Liu Shuai Zhang Zhiping Lin FedML 34 30 0 21 Jun 2019
Measuring the Effects of Data Parallelism on Neural Network Training Christopher J. Shallue Jaehoon Lee J. Antognini J. Mamou J. Ketterling Yao Wang 63 408 0 08 Nov 2018
Group Normalization Yuxin Wu Kaiming He 121 3,626 0 22 Mar 2018
YellowFin and the Art of Momentum Tuning Jian Zhang Ioannis Mitliagkas ODL 37 108 0 12 Jun 2017
Revisiting Distributed Synchronous SGD Jianmin Chen Xinghao Pan R. Monga Samy Bengio Rafal Jozefowicz 53 799 0 04 Apr 2016
Identity Mappings in Deep Residual Networks Kaiming He Xinming Zhang Shaoqing Ren Jian Sun 259 10,149 0 16 Mar 2016