ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.03888
  4. Cited By
Large Batch Training of Convolutional Networks

Large Batch Training of Convolutional Networks

13 August 2017
Yang You
Igor Gitman
Boris Ginsburg
    ODL
ArXivPDFHTML

Papers citing "Large Batch Training of Convolutional Networks"

45 / 545 papers shown
Title
Understanding Generalization through Visualizations
Understanding Generalization through Visualizations
Yifan Jiang
Z. Emam
Micah Goldblum
Liam H. Fowl
J. K. Terry
Furong Huang
Tom Goldstein
AI4CE
21
80
0
07 Jun 2019
Implicit Regularization in Deep Matrix Factorization
Implicit Regularization in Deep Matrix Factorization
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
38
493
0
31 May 2019
Stochastic Gradient Methods with Layer-wise Adaptive Moments for
  Training of Deep Networks
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks
Boris Ginsburg
P. Castonguay
Oleksii Hrinchuk
Oleksii Kuchaiev
Vitaly Lavrukhin
Ryan Leary
Jason Chun Lok Li
Huyen Nguyen
Yang Zhang
Jonathan M. Cohen
ODL
22
13
0
27 May 2019
Leader Stochastic Gradient Descent for Distributed Training of Deep
  Learning Models: Extension
Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension
Yunfei Teng
Wenbo Gao
F. Chalus
A. Choromańska
D. Goldfarb
Adrian Weller
32
12
0
24 May 2019
Blockwise Adaptivity: Faster Training and Better Generalization in Deep
  Learning
Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning
Shuai Zheng
James T. Kwok
ODL
19
5
0
23 May 2019
Scaling Distributed Training of Flood-Filling Networks on HPC
  Infrastructure for Brain Mapping
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping
Wu Dong
Murat Keçeli
Rafael Vescovi
Hanyu Li
Corey Adams
...
T. Uram
V. Vishwanath
N. Ferrier
B. Kasthuri
P. Littlewood
FedML
AI4CE
19
9
0
13 May 2019
Fast AutoAugment
Fast AutoAugment
Sungbin Lim
Ildoo Kim
Taesup Kim
Chiheon Kim
Sungwoong Kim
21
590
0
01 May 2019
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the
  Limbo of Resources
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources
Yanghua Peng
Hang Zhang
Yifei Ma
Tong He
Zhi-Li Zhang
Sheng Zha
Mu Li
25
23
0
26 Apr 2019
Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7
  seconds
Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds
Masafumi Yamazaki
Akihiko Kasagi
Akihiro Tabuchi
Takumi Honda
Masahiro Miwa
Naoto Fukumoto
Tsuguchika Tabaru
Atsushi Ike
Kohta Nakashima
4
88
0
29 Mar 2019
swCaffe: a Parallel Framework for Accelerating Deep Learning
  Applications on Sunway TaihuLight
swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
Jiarui Fang
Liandeng Li
Haohuan Fu
Jinlei Jiang
Wenlai Zhao
Conghui He
Xin You
Guangwen Yang
21
30
0
16 Mar 2019
Communication-efficient distributed SGD with Sketching
Communication-efficient distributed SGD with Sketching
Nikita Ivkin
D. Rothchild
Enayat Ullah
Vladimir Braverman
Ion Stoica
R. Arora
FedML
14
198
0
12 Mar 2019
Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural
  Architecture Search
Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
Xiaolong Li
Yiming Zhou
Zheng Pan
Jiashi Feng
3DV
27
158
0
09 Mar 2019
Learned Step Size Quantization
Learned Step Size Quantization
S. K. Esser
J. McKinstry
Deepika Bablani
R. Appuswamy
D. Modha
MQ
20
778
0
21 Feb 2019
Augment your batch: better training with larger batches
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
30
72
0
27 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU
  Servers
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Luo Mai
Paolo Costa
Peter R. Pietzuch
27
69
0
08 Jan 2019
An Empirical Model of Large-Batch Training
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
18
270
0
14 Dec 2018
Bag of Tricks for Image Classification with Convolutional Neural
  Networks
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
221
1,400
0
04 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic
  Gradient Descent
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Noah Golmant
N. Vemuri
Z. Yao
Vladimir Feinberg
A. Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
21
73
0
30 Nov 2018
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Hiroaki Mikami
Hisahiro Suganuma
Pongsakorn U-chupala
Yoshiki Tanaka
Yuichi Kageyama
13
77
0
13 Nov 2018
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks
K. Chahal
Manraj Singh Grover
Kuntal Dey
3DH
OOD
6
53
0
28 Oct 2018
Exascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate Analytics
Thorsten Kurth
Sean Treichler
Josh Romero
M. Mudigonda
Nathan Luehr
...
Michael A. Matheson
J. Deslippe
M. Fatica
P. Prabhat
Michael Houston
BDL
17
260
0
03 Oct 2018
Large batch size training of neural networks with adversarial training
  and second-order information
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Joseph E. Gonzalez
Kurt Keutzer
Michael W. Mahoney
ODL
8
42
0
02 Oct 2018
The Convergence of Sparsified Gradient Methods
The Convergence of Sparsified Gradient Methods
Dan Alistarh
Torsten Hoefler
M. Johansson
Sarit Khirirat
Nikola Konstantinov
Cédric Renggli
11
489
0
27 Sep 2018
Sparsity in Deep Neural Networks - An Empirical Investigation with
  TensorQuant
Sparsity in Deep Neural Networks - An Empirical Investigation with TensorQuant
D. Loroch
Franz-Josef Pfreundt
Norbert Wehn
J. Keuper
25
5
0
27 Aug 2018
Don't Use Large Mini-Batches, Use Local SGD
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
57
429
0
22 Aug 2018
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
Amrita Mathuriya
Deborah Bard
P. Mendygral
Lawrence Meadows
James A. Arnemann
...
Nalini Kumar
S. Ho
Michael F. Ringenburg
P. Prabhat
Victor W. Lee
AI4CE
10
125
0
14 Aug 2018
Fast, Better Training Trick -- Random Gradient
Fast, Better Training Trick -- Random Gradient
Jiakai Wei
ODL
13
2
0
13 Aug 2018
Highly Scalable Deep Learning Training System with Mixed-Precision:
  Training ImageNet in Four Minutes
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes
Xianyan Jia
Shutao Song
W. He
Yangzihao Wang
Haidong Rong
...
Li Yu
Tiegang Chen
Guangxiao Hu
S. Shi
Xiaowen Chu
9
383
0
30 Jul 2018
Kernel machines that adapt to GPUs for effective large batch training
Kernel machines that adapt to GPUs for effective large batch training
Siyuan Ma
M. Belkin
11
2
0
15 Jun 2018
PANDA: Facilitating Usable AI Development
PANDA: Facilitating Usable AI Development
Jinyang Gao
Wei Wang
Meihui Zhang
Gang Chen
H. V. Jagadish
Guoliang Li
Teck Khim Ng
Beng Chin Ooi
Sheng Wang
Jingren Zhou
30
4
0
26 Apr 2018
GossipGraD: Scalable Deep Learning using Gossip Communication based
  Asynchronous Gradient Descent
GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent
J. Daily
Abhinav Vishnu
Charles Siegel
T. Warfel
Vinay C. Amatya
15
95
0
15 Mar 2018
Train Feedfoward Neural Network with Layer-wise Adaptive Rate via
  Approximating Back-matching Propagation
Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation
Huishuai Zhang
Wei-neng Chen
Tie-Yan Liu
25
6
0
27 Feb 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
33
703
0
26 Feb 2018
Hessian-based Analysis of Large Batch Training and Robustness to
  Adversaries
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
35
163
0
22 Feb 2018
The Secret Sharer: Evaluating and Testing Unintended Memorization in
  Neural Networks
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
Nicholas Carlini
Chang-rui Liu
Ulfar Erlingsson
Jernej Kos
D. Song
89
1,114
0
22 Feb 2018
SparCML: High-Performance Sparse Communication for Machine Learning
SparCML: High-Performance Sparse Communication for Machine Learning
Cédric Renggli
Saleh Ashkboos
Mehdi Aghagolzadeh
Dan Alistarh
Torsten Hoefler
29
126
0
22 Feb 2018
A Progressive Batching L-BFGS Method for Machine Learning
A Progressive Batching L-BFGS Method for Machine Learning
Raghu Bollapragada
Dheevatsa Mudigere
J. Nocedal
Hao-Jun Michael Shi
P. T. P. Tang
ODL
11
152
0
15 Feb 2018
Convergence Analysis of Gradient Descent Algorithms with Proportional
  Updates
Convergence Analysis of Gradient Descent Algorithms with Proportional Updates
Igor Gitman
D. Dilipkumar
Ben Parr
18
5
0
09 Jan 2018
Parallel Complexity of Forward and Backward Propagation
Parallel Complexity of Forward and Backward Propagation
Maxim Naumov
22
8
0
18 Dec 2017
Integrated Model, Batch and Domain Parallelism in Training Neural
  Networks
Integrated Model, Batch and Domain Parallelism in Training Neural Networks
A. Gholami
A. Azad
Peter H. Jin
Kurt Keutzer
A. Buluç
26
82
0
12 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural
  Networks
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan
Ying Xiao
Rif A. Saurous
ODL
22
19
0
08 Dec 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
30
981
0
01 Nov 2017
Slim-DP: A Light Communication Data Parallelism for DNN
Slim-DP: A Light Communication Data Parallelism for DNN
Shizhao Sun
Wei-neng Chen
Jiang Bian
Xiaoguang Liu
Tie-Yan Liu
12
0
0
27 Sep 2017
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
44
795
0
24 May 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,892
0
15 Sep 2016
Previous
123...10119