v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017

Piotr Dollár

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown

Title
Parallel Complexity of Forward and Backward Propagation Maxim Naumov 52 8 0 18 Dec 2017
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning Siyuan Ma Raef Bassily M. Belkin 117 291 0 18 Dec 2017
Integrated Model, Batch and Domain Parallelism in Training Neural Networks A. Gholami A. Azad Peter H. Jin Kurt Keutzer A. Buluç 95 84 0 12 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan Ying Xiao Rif A. Saurous ODL 45 20 0 08 Dec 2017
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training Chia-Yu Chen Jungwook Choi D. Brand A. Agrawal Wei Zhang K. Gopalakrishnan ODL 79 174 0 07 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks Aditya Devarakonda Maxim Naumov M. Garland ODL 112 136 0 06 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training Chengyue Wu Song Han Huizi Mao Yu Wang W. Dally 231 1,413 0 05 Dec 2017
State-of-the-art Speech Recognition With Sequence-to-Sequence Models Chung-Cheng Chiu Tara N. Sainath Yonghui Wu Rohit Prabhavalkar Patrick Nguyen ... Katya Gonina Navdeep Jaitly Yue Liu J. Chorowski M. Bacchiani AI4TS 174 1,155 0 05 Dec 2017
A Closer Look at Spatiotemporal Convolutions for Action Recognition Du Tran Heng Wang Lorenzo Torresani Jamie Ray Yann LeCun Manohar Paluri 258 3,042 0 30 Nov 2017
Non-local Neural Networks Xinyu Wang Ross B. Girshick Abhinav Gupta Kaiming He OffRL 366 8,940 0 21 Nov 2017
MegDet: A Large Mini-Batch Object Detector Chao Peng Tete Xiao Zeming Li Yuning Jiang Xiangyu Zhang Kai Jia Gang Yu Jian Sun ObjD 209 318 0 20 Nov 2017
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning Ziming Zhang Yuanwei Wu Guanghui Wang ODL 65 28 0 19 Nov 2017
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs Shaoshuai Shi Qiang-qiang Wang Xiaowen Chu 90 110 0 16 Nov 2017
AOGNets: Compositional Grammatical Architectures for Deep Learning Xilai Li Xi Song Tianfu Wu 72 26 0 15 Nov 2017
Three Factors Influencing Minima in SGD Stanislaw Jastrzebski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey 85 463 0 13 Nov 2017
Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes Takuya Akiba Shuji Suzuki Keisuke Fukuda VLM 76 314 0 12 Nov 2017
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train V. Codreanu Damian Podareanu V. Saletore 70 55 0 12 Nov 2017
Efficient Training of Convolutional Neural Nets on Large Distributed Systems Sameer Kumar D. Sreedhar Vaibhav Saxena Yogish Sabharwal Ashish Verma 63 4 0 02 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 130 996 0 01 Nov 2017
ChainerMN: Scalable Distributed Deep Learning Framework Takuya Akiba Keisuke Fukuda Shuji Suzuki AI4CE BDL GNN 65 60 0 31 Oct 2017
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks Pratik Chaudhari Stefano Soatto MLT 104 304 0 30 Oct 2017
mixup: Beyond Empirical Risk Minimization Hongyi Zhang Moustapha Cissé Yann N. Dauphin David Lopez-Paz NoLa 323 9,831 0 25 Oct 2017
Asynchronous Decentralized Parallel Stochastic Gradient Descent Xiangru Lian Wei Zhang Ce Zhang Ji Liu ODL 75 500 0 18 Oct 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent Samuel L. Smith Quoc V. Le BDL 126 253 0 17 Oct 2017
Synkhronos: a Multi-GPU Theano Extension for Data Parallelism Adam Stooke Pieter Abbeel SyDa GNN 24 0 0 11 Oct 2017
Slim-DP: A Light Communication Data Parallelism for DNN Shizhao Sun Wei-neng Chen Jiang Bian Xiaoguang Liu Tie-Yan Liu 24 0 0 27 Sep 2017
Stochastic Nonconvex Optimization with Large Minibatches Weiran Wang Nathan Srebro 96 26 0 25 Sep 2017
Deep Sparse Subspace Clustering Xi Peng Jiashi Feng Shijie Xiao Jiwen Lu Zhang Yi Shuicheng Yan 60 22 0 25 Sep 2017
Online Learning of a Memory for Learning Rates Franziska Meier Daniel Kappler S. Schaal 62 21 0 20 Sep 2017
ImageNet Training in Minutes Yang You Zhao-jie Zhang Cho-Jui Hsieh J. Demmel Kurt Keutzer VLM LRM 186 57 0 14 Sep 2017
What does fault tolerant Deep Learning need from MPI? Vinay C. Amatya Abhinav Vishnu Charles Siegel J. Daily 74 19 0 11 Sep 2017
An Adaptive Sampling Scheme to Efficiently Train Fully Convolutional Networks for Semantic Segmentation L. Berger E. Hyde M. Jorge Cardoso Sebastien Ourselin SSeg 95 41 0 08 Sep 2017
Simple Recurrent Units for Highly Parallelizable Recurrence Tao Lei Yu Zhang Sida I. Wang Huijing Dai Yoav Artzi LRM 165 277 0 08 Sep 2017
Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads Tian Li Jie Zhong Ji Liu Wentao Wu Ce Zhang 54 70 0 24 Aug 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates L. Smith Nicholay Topin AI4CE 108 518 0 23 Aug 2017
Large Batch Training of Convolutional Networks Yang You Igor Gitman Boris Ginsburg ODL 169 854 0 13 Aug 2017
Distributed Training Large-Scale Deep Architectures Shang-Xuan Zou Chun-Yen Chen Jui-Lin Wu Chun-Nan Chou Chia-Chin Tsao Kuan-Chieh Tung Ting-Wei Lin Cheng-Lung Sung Edward Y. Chang 53 22 0 10 Aug 2017
Regularizing and Optimizing LSTM Language Models Stephen Merity N. Keskar R. Socher 178 1,098 0 07 Aug 2017
A Robust Multi-Batch L-BFGS Method for Machine Learning A. Berahas Martin Takáč AAML ODL 111 44 0 26 Jul 2017
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives Fartash Faghri David J. Fleet J. Kiros Sanja Fidler VLM 87 183 0 18 Jul 2017
Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures Joseph Suárez Clare Zhu 49 0 0 08 Jul 2017
Stochastic, Distributed and Federated Optimization for Machine Learning Jakub Konecný FedML 83 38 0 04 Jul 2017
Parle: parallelizing stochastic gradient descent Pratik Chaudhari Carlo Baldassi R. Zecchina Stefano Soatto Ameet Talwalkar Adam M. Oberman ODL FedML 85 21 0 03 Jul 2017
Training a Fully Convolutional Neural Network to Route Integrated Circuits Sambhav R. Jain Kye L. Okabe SSL 22 8 0 27 Jun 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning Dong Yin A. Pananjady Max Lam Dimitris Papailiopoulos Kannan Ramchandran Peter L. Bartlett 89 11 0 18 Jun 2017
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks Levent Sagun Utku Evci V. U. Güney Yann N. Dauphin Léon Bottou 107 420 0 14 Jun 2017
Training Quantized Nets: A Deeper Understanding Hao Li Soham De Zheng Xu Christoph Studer H. Samet Tom Goldstein MQ 87 211 0 07 Jun 2017
Train longer, generalize better: closing the generalization gap in large batch training of neural networks Elad Hoffer Itay Hubara Daniel Soudry ODL 198 803 0 24 May 2017
Insensitive Stochastic Gradient Twin Support Vector Machine for Large Scale Problems Zhen Wang Yifei Shao Lan Bai Li-Ming Liu N. Deng 43 40 0 19 Apr 2017
Deep Relaxation: partial differential equations for optimizing deep neural networks Pratik Chaudhari Adam M. Oberman Stanley Osher Stefano Soatto G. Carlier 174 154 0 17 Apr 2017