Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training

v1v2 (latest)

Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training

20 November 2024

Jared Fernandez

Mostafa Elhoushi

ArXiv (abs)PDF HTML

Papers citing "Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training"

11 / 61 papers shown

Title
Energy and Policy Considerations for Deep Learning in NLP Emma Strubell Ananya Ganesh Andrew McCallum 76 2,660 0 05 Jun 2019
Accelerating Convolutional Neural Networks via Activation Map Compression Georgios Georgiadis 75 76 0 10 Dec 2018
Mesh-TensorFlow: Deep Learning for Supercomputers Noam M. Shazeer Youlong Cheng Niki Parmar Dustin Tran Ashish Vaswani ... HyoukJoong Lee O. Milenkovic C. Young Ryan Sepassi Blake Hechtman GNN MoE AI4CE 89 392 0 05 Nov 2018
Beyond Data and Model Parallelism for Deep Neural Networks Zhihao Jia Matei A. Zaharia A. Aiken GNN AI4CE 64 506 0 14 Jul 2018
PipeDream: Fast and Efficient Pipeline Parallel DNN Training A. Harlap Deepak Narayanan Amar Phanishayee Vivek Seshadri Nikhil R. Devanur G. Ganger Phillip B. Gibbons AI4CE 63 254 0 08 Jun 2018
Local SGD Converges Fast and Communicates Little Sebastian U. Stich FedML 187 1,070 0 24 May 2018
Integrated Model, Batch and Domain Parallelism in Training Neural Networks A. Gholami A. Azad Peter H. Jin Kurt Keutzer A. Buluç 79 84 0 12 Dec 2017
Mixed Precision Training Paulius Micikevicius Sharan Narang Jonah Alben G. Diamos Erich Elsen ... Boris Ginsburg Michael Houston Oleksii Kuchaiev Ganesh Venkatesh Hao Wu 176 1,805 0 10 Oct 2017
In-Datacenter Performance Analysis of a Tensor Processing Unit N. Jouppi C. Young Nishant Patil David Patterson Gaurav Agrawal ... Vijay Vasudevan Richard Walter Walter Wang Eric Wilcox Doe Hyun Yoon 237 4,644 0 16 Apr 2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Noam M. Shazeer Azalia Mirhoseini Krzysztof Maziarz Andy Davis Quoc V. Le Geoffrey E. Hinton J. Dean MoE 253 2,692 0 23 Jan 2017
Training Deep Nets with Sublinear Memory Cost Tianqi Chen Bing Xu Chiyuan Zhang Carlos Guestrin 106 1,174 0 21 Apr 2016