ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.11666
  4. Cited By
Pipelined Backpropagation at Scale: Training Large Models without
  Batches

Pipelined Backpropagation at Scale: Training Large Models without Batches

25 March 2020
Atli Kosson
Vitaliy Chiley
Abhinav Venigalla
Joel Hestness
Urs Koster
ArXivPDFHTML

Papers citing "Pipelined Backpropagation at Scale: Training Large Models without Batches"

9 / 9 papers shown
Title
Batch Normalization Biases Residual Blocks Towards the Identity Function
  in Deep Networks
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
32
20
0
24 Feb 2020
Dissecting the Graphcore IPU Architecture via Microbenchmarking
Dissecting the Graphcore IPU Architecture via Microbenchmarking
Zhe Jia
Blake Tillman
Marco Maggioni
D. Scarpazza
26
134
0
07 Dec 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
279
1,861
0
17 Sep 2019
Fully Decoupled Neural Network Learning Using Delayed Gradients
Fully Decoupled Neural Network Learning Using Delayed Gradients
Huiping Zhuang
Yi Wang
Qinglai Liu
Shuai Zhang
Zhiping Lin
FedML
34
30
0
21 Jun 2019
Measuring the Effects of Data Parallelism on Neural Network Training
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
63
408
0
08 Nov 2018
Group Normalization
Group Normalization
Yuxin Wu
Kaiming He
121
3,626
0
22 Mar 2018
YellowFin and the Art of Momentum Tuning
YellowFin and the Art of Momentum Tuning
Jian Zhang
Ioannis Mitliagkas
ODL
37
108
0
12 Jun 2017
Revisiting Distributed Synchronous SGD
Revisiting Distributed Synchronous SGD
Jianmin Chen
Xinghao Pan
R. Monga
Samy Bengio
Rafal Jozefowicz
53
799
0
04 Apr 2016
Identity Mappings in Deep Residual Networks
Identity Mappings in Deep Residual Networks
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
259
10,149
0
16 Mar 2016
1