Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1901.08256
Cited By
Large-Batch Training for LSTM and Beyond
24 January 2019
Yang You
Jonathan Hseu
Chris Ying
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large-Batch Training for LSTM and Beyond"
38 / 38 papers shown
Title
Accelerating Deep Neural Network Training via Distributed Hybrid Order Optimization
Shunxian Gu
Chaoqun You
Bangbang Ren
Lailong Luo
Junxu Xia
Deke Guo
44
0
0
02 May 2025
Uncertainty-Informed Volume Visualization using Implicit Neural Representation
Shanu Saklani
Chitwan Goel
Shrey Bansal
Zhe Wang
Soumya Dutta
Tushar M. Athawale
D. Pugmire
Christopher R. Johnson
50
0
0
12 Aug 2024
Full-Stack Allreduce on Multi-Rail Networks
Enda Yu
Dezun Dong
Xiangke Liao
GNN
32
0
0
28 May 2024
Visual Analysis of Prediction Uncertainty in Neural Networks for Deep Image Synthesis
Soumya Dutta
Faheem Nizar
Ahmad Amaan
Ayan Acharya
AAML
48
1
0
22 May 2024
Mastery Guided Non-parametric Clustering to Scale-up Strategy Prediction
Anup Shakya
Vasile Rus
Deepak Venugopal
19
0
0
04 Jan 2024
Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization
Hamidreza Almasi
Harshit Mishra
Balajee Vamanan
Sathya Ravi
FedML
27
0
0
12 Feb 2023
Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules
J. Choi
Pei Zhang
Kshitij Mehta
Andrew E. Blanchard
Massimiliano Lupo Pasini
GNN
AI4CE
20
9
0
22 Jul 2022
Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences
G. Moon
E. Cyr
25
5
0
07 Mar 2022
Distributed SLIDE: Enabling Training Large Neural Networks on Low Bandwidth and Simple CPU-Clusters via Model Parallelism and Sparsity
Minghao Yan
Nicholas Meisburger
Tharun Medini
Anshumali Shrivastava
19
6
0
29 Jan 2022
Near-Optimal Sparse Allreduce for Distributed Deep Learning
Shigang Li
Torsten Hoefler
31
51
0
19 Jan 2022
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Sian Jin
Chengming Zhang
Xintong Jiang
Yunhe Feng
Hui Guan
Guanpeng Li
Shuaiwen Leon Song
Dingwen Tao
30
23
0
18 Nov 2021
Optimizing Neural Network for Computer Vision task in Edge Device
S. RanjithM
S. Parameshwara
A. PavanYadav
Shriganesh Hegde
14
1
0
02 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
4-bit Quantization of LSTM-based Speech Recognition Models
A. Fasoli
Chia-Yu Chen
Mauricio Serrano
Xiao Sun
Naigang Wang
...
Xiaodong Cui
Brian Kingsbury
Wei Zhang
Zoltán Tüske
K. Gopalakrishnan
MQ
26
21
0
27 Aug 2021
Logit Attenuating Weight Normalization
Aman Gupta
R. Ramanath
Jun Shi
Anika Ramachandran
Sirou Zhou
Mingzhou Zhou
S. Keerthi
45
1
0
12 Aug 2021
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters
Chen Sun
Shenggui Li
Jinyue Wang
Jun Yu
54
47
0
08 Aug 2021
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNN
AI4CE
LRM
80
132
0
14 Jul 2021
CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation
Enda Yu
Dezun Dong
Yemao Xu
Shuo Ouyang
Xiangke Liao
16
5
0
21 Jun 2021
NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning
Minghan Yang
Dong Xu
Qiwen Cui
Zaiwen Wen
Pengxiang Xu
26
4
0
14 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
36
13
0
01 Jun 2021
DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training
Kun Yuan
Yiming Chen
Xinmeng Huang
Yingya Zhang
Pan Pan
Yinghui Xu
W. Yin
MoE
60
61
0
24 Apr 2021
Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient
Fengli Gao
Huicai Zhong
ODL
25
9
0
16 Dec 2020
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Aurick Qiao
Sang Keun Choe
Suhas Jayaram Subramanya
Willie Neiswanger
Qirong Ho
Hao Zhang
G. Ganger
Eric Xing
VLM
8
179
0
27 Aug 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
28
8
0
28 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
30
37
0
09 Jul 2020
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
Shiqing Fan
Yi Rong
Chen Meng
Zongyan Cao
Siyu Wang
...
Jun Yang
Lixue Xia
Lansong Diao
Xiaoyong Liu
Wei Lin
21
233
0
02 Jul 2020
Convolutional Neural Network Training with Distributed K-FAC
J. G. Pauloski
Zhao Zhang
Lei Huang
Weijia Xu
Ian Foster
23
30
0
01 Jul 2020
Extrapolation for Large-batch Training in Deep Learning
Tao R. Lin
Lingjing Kong
Sebastian U. Stich
Martin Jaggi
28
36
0
10 Jun 2020
A Quantitative Survey of Communication Optimizations in Distributed Deep Learning
S. Shi
Zhenheng Tang
Xiaowen Chu
Chengjian Liu
Wei Wang
Bo Li
GNN
AI4CE
20
3
0
27 May 2020
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Dhiraj D. Kalamkar
E. Georganas
Sudarshan Srinivasan
Jianping Chen
Mikhail Shiryaev
A. Heinecke
56
48
0
10 May 2020
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
30
12
0
06 Mar 2020
Large Batch Training Does Not Need Warmup
Zhouyuan Huo
Bin Gu
Heng-Chiao Huang
AI4CE
ODL
29
5
0
04 Feb 2020
Distributed Learning of Deep Neural Networks using Independent Subnet Training
John Shelton Hyatt
Cameron R. Wolfe
Michael Lee
Yuxin Tang
Anastasios Kyrillidis
Christopher M. Jermaine
OOD
29
35
0
04 Oct 2019
High-Performance Deep Learning via a Single Building Block
E. Georganas
K. Banerjee
Dhiraj D. Kalamkar
Sasikanth Avancha
Anand Venkat
Michael J. Anderson
G. Henry
Hans Pabst
A. Heinecke
26
12
0
15 Jun 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
28
985
0
01 Apr 2019
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
33
704
0
26 Feb 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhehuai Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
718
6,750
0
26 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
310
2,896
0
15 Sep 2016
1