ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.11205
  4. Cited By
Highly Scalable Deep Learning Training System with Mixed-Precision:
  Training ImageNet in Four Minutes

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

30 July 2018
Xianyan Jia
Shutao Song
W. He
Yangzihao Wang
Haidong Rong
Feihu Zhou
Liqiang Xie
Zhenyu Guo
Yuanzhou Yang
Li Yu
Tiegang Chen
Guangxiao Hu
S. Shi
Xiaowen Chu
ArXivPDFHTML

Papers citing "Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes"

50 / 68 papers shown
Title
DejAIvu: Identifying and Explaining AI Art on the Web in Real-Time with Saliency Maps
DejAIvu: Identifying and Explaining AI Art on the Web in Real-Time with Saliency Maps
Jocelyn Dzuong
100
0
0
12 Feb 2025
Importance Sampling via Score-based Generative Models
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedIm
DiffM
146
0
0
07 Feb 2025
How to set AdamW's weight decay as you scale model and dataset size
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
46
10
0
22 May 2024
PID-Comm: A Fast and Flexible Collective Communication Framework for
  Commodity Processing-in-DIMM Devices
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
Si Ung Noh
Junguk Hong
Chaemin Lim
Seong-Yeol Park
Jeehyun Kim
Hanjun Kim
Youngsok Kim
Jinho Lee
34
7
0
13 Apr 2024
Speeding up and reducing memory usage for scientific machine learning
  via mixed precision
Speeding up and reducing memory usage for scientific machine learning via mixed precision
Joel Hayford
Jacob Goldman-Wetzler
Eric Wang
Lu Lu
49
8
0
30 Jan 2024
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
32
41
0
07 Apr 2023
RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of
  Quantized CNNs
RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs
A. M. Ribeiro-dos-Santos
João Dinis Ferreira
O. Mutlu
G. Falcão
MQ
21
1
0
15 Jan 2023
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware
  Training
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training
Mingliang Xu
Gongrui Nan
Yuxin Zhang
Rongrong Ji
Rongrong Ji
MQ
20
3
0
12 Nov 2022
Large-batch Optimization for Dense Visual Predictions
Large-batch Optimization for Dense Visual Predictions
Zeyue Xue
Jianming Liang
Guanglu Song
Zhuofan Zong
Liang Chen
Yu Liu
Ping Luo
VLM
39
9
0
20 Oct 2022
Towards Efficient Communications in Federated Learning: A Contemporary
  Survey
Towards Efficient Communications in Federated Learning: A Contemporary Survey
Zihao Zhao
Yuzhu Mao
Yang Liu
Linqi Song
Ouyang Ye
Xinlei Chen
Wenbo Ding
FedML
63
60
0
02 Aug 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed
  Preconditioning
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning
Lin Zhang
S. Shi
Wei Wang
Bo-wen Li
38
10
0
30 Jun 2022
Guidelines for the Regularization of Gammas in Batch Normalization for
  Deep Residual Networks
Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks
Bum Jun Kim
Hyeyeon Choi
Hyeonah Jang
Dong Gu Lee
Wonseok Jeong
Sang Woo Kim
21
4
0
15 May 2022
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for
  Distributed Training Jobs
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
Weiyang Wang
Moein Khazraee
Zhizhen Zhong
M. Ghobadi
Zhihao Jia
Dheevatsa Mudigere
Ying Zhang
A. Kewitsch
39
85
0
01 Feb 2022
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
32
14
0
01 Nov 2021
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the
  Edge
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge
Abdelrahman I. Hosny
Marina Neseem
Sherief Reda
MQ
35
4
0
29 Oct 2021
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for
  Distributed Training of DL Models
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Saeed Rashidi
William Won
Sudarshan Srinivasan
Srinivas Sridharan
T. Krishna
GNN
30
30
0
09 Oct 2021
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
Complexity-aware Adaptive Training and Inference for Edge-Cloud
  Distributed AI Systems
Complexity-aware Adaptive Training and Inference for Edge-Cloud Distributed AI Systems
Yinghan Long
I. Chakraborty
G. Srinivasan
Kaushik Roy
30
14
0
14 Sep 2021
Accelerating Distributed K-FAC with Smart Parallelism of Computing and
  Communication Tasks
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
S. Shi
Lin Zhang
Bo-wen Li
40
9
0
14 Jul 2021
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
24
21
0
02 Jul 2021
Concurrent Adversarial Learning for Large-Batch Training
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
36
13
0
01 Jun 2021
ScaleCom: Scalable Sparsified Gradient Compression for
  Communication-Efficient Distributed Training
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Chia-Yu Chen
Jiamin Ni
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
...
Naigang Wang
Swagath Venkataramani
Vijayalakshmi Srinivasan
Wei Zhang
K. Gopalakrishnan
29
0
0
21 Apr 2021
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of
  Convolutional Neural Networks
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks
A. Kahira
Truong Thao Nguyen
L. Bautista-Gomez
Ryousei Takano
Rosa M. Badia
Mohamed Wahib
18
9
0
19 Apr 2021
Large Batch Simulation for Deep Reinforcement Learning
Large Batch Simulation for Deep Reinforcement Learning
Brennan Shacklett
Erik Wijmans
Aleksei Petrenko
Manolis Savva
Dhruv Batra
V. Koltun
Kayvon Fatahalian
3DV
OffRL
AI4CE
29
26
0
12 Mar 2021
GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient
  Deep Model Training
GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training
Krishnateja Killamsetty
D. Sivasubramanian
Ganesh Ramakrishnan
A. De
Rishabh K. Iyer
OOD
94
190
0
27 Feb 2021
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient
  Descent
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent
Heesu Kim
Hanmin Park
Taehyun Kim
Kwanheum Cho
Eojin Lee
Soojung Ryu
Hyuk-Jae Lee
Kiyoung Choi
Jinho Lee
24
36
0
15 Feb 2021
Large-Scale Training System for 100-Million Classification at Alibaba
Large-Scale Training System for 100-Million Classification at Alibaba
Liuyihan Song
Pan Pan
Kang Zhao
Hao Yang
Yiming Chen
Yingya Zhang
Yinghui Xu
Rong Jin
40
23
0
09 Feb 2021
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
Y. Fu
Haoran You
Yang Katie Zhao
Yue Wang
Chaojian Li
K. Gopalakrishnan
Zhangyang Wang
Yingyan Lin
MQ
38
32
0
24 Dec 2020
Data optimization for large batch distributed training of deep neural
  networks
Data optimization for large batch distributed training of deep neural networks
Shubhankar Gahlot
Junqi Yin
Mallikarjun Shankar
21
1
0
16 Dec 2020
VirtualFlow: Decoupling Deep Learning Models from the Underlying
  Hardware
VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware
Andrew Or
Haoyu Zhang
M. Freedman
17
9
0
20 Sep 2020
Enabling On-Device CNN Training by Self-Supervised Instance Filtering
  and Error Map Pruning
Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning
Yawen Wu
Zhepeng Wang
Yiyu Shi
Jiaxi Hu
24
44
0
07 Jul 2020
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
Shiqing Fan
Yi Rong
Chen Meng
Zongyan Cao
Siyu Wang
...
Jun Yang
Lixue Xia
Lansong Diao
Xiaoyong Liu
Wei Lin
21
233
0
02 Jul 2020
The Limit of the Batch Size
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
16
15
0
15 Jun 2020
HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU
  Clusters through Integration of Pipelined Model Parallelism and Data
  Parallelism
HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism
Jay H. Park
Gyeongchan Yun
Chang Yi
N. T. Nguyen
Seungmin Lee
Jaesik Choi
S. Noh
Young-ri Choi
MoE
25
129
0
28 May 2020
A unifying mutual information view of metric learning: cross-entropy vs.
  pairwise losses
A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses
Malik Boudiaf
Jérôme Rony
Imtiaz Masud Ziko
Eric Granger
M. Pedersoli
Pablo Piantanida
Ismail Ben Ayed
SSL
23
158
0
19 Mar 2020
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
30
12
0
06 Mar 2020
Distributed Training of Deep Neural Network Acoustic Models for
  Automatic Speech Recognition
Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition
Xiaodong Cui
Wei Zhang
Ulrich Finkler
G. Saon
M. Picheny
David S. Kung
27
19
0
24 Feb 2020
Communication Contention Aware Scheduling of Multiple Deep Learning
  Training Jobs
Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs
Qiang-qiang Wang
S. Shi
Canhui Wang
Xiaowen Chu
24
13
0
24 Feb 2020
Communication-Efficient Decentralized Learning with Sparsification and
  Adaptive Peer Selection
Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection
Zhenheng Tang
S. Shi
Xiaowen Chu
FedML
21
57
0
22 Feb 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that
  Generalizes Well
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
38
55
0
07 Jan 2020
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
27
168
0
19 Dec 2019
MG-WFBP: Merging Gradients Wisely for Efficient Communication in
  Distributed Deep Learning
MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning
S. Shi
Xiaowen Chu
Bo Li
FedML
28
25
0
18 Dec 2019
Understanding Top-k Sparsification in Distributed Deep Learning
Understanding Top-k Sparsification in Distributed Deep Learning
S. Shi
Xiaowen Chu
Ka Chun Cheung
Simon See
30
95
0
20 Nov 2019
Layer-wise Adaptive Gradient Sparsification for Distributed Deep
  Learning with Convergence Guarantees
Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees
S. Shi
Zhenheng Tang
Qiang-qiang Wang
Kaiyong Zhao
Xiaowen Chu
19
22
0
20 Nov 2019
Highly-scalable, physics-informed GANs for learning solutions of
  stochastic PDEs
Highly-scalable, physics-informed GANs for learning solutions of stochastic PDEs
Liu Yang
Sean Treichler
Thorsten Kurth
Keno Fischer
D. Barajas-Solano
...
Valentin Churavy
A. Tartakovsky
Michael Houston
P. Prabhat
George Karniadakis
AI4CE
47
38
0
29 Oct 2019
MLPerf Training Benchmark
MLPerf Training Benchmark
Arya D. McCarthy
Christine Cheng
Cody Coleman
Greg Diamos
Paulius Micikevicius
...
Carole-Jean Wu
Lingjie Xu
Masafumi Yamazaki
C. Young
Matei A. Zaharia
47
305
0
02 Oct 2019
Lookahead Optimizer: k steps forward, 1 step back
Lookahead Optimizer: k steps forward, 1 step back
Michael Ruogu Zhang
James Lucas
Geoffrey E. Hinton
Jimmy Ba
ODL
54
723
0
19 Jul 2019
Fast Training of Sparse Graph Neural Networks on Dense Hardware
Fast Training of Sparse Graph Neural Networks on Dense Hardware
Matej Balog
B. V. Merrienboer
Subhodeep Moitra
Yujia Li
Daniel Tarlow
GNN
39
10
0
27 Jun 2019
Database Meets Deep Learning: Challenges and Opportunities
Database Meets Deep Learning: Challenges and Opportunities
Wei Wang
Meihui Zhang
Gang Chen
H. V. Jagadish
Beng Chin Ooi
K. Tan
13
147
0
21 Jun 2019
Deep Leakage from Gradients
Deep Leakage from Gradients
Ligeng Zhu
Zhijian Liu
Song Han
FedML
43
2,160
0
21 Jun 2019
12
Next