Large Scale Language Modeling: Converging on 40GB of Text in Four Hours

v1v2 (latest)

Large Scale Language Modeling: Converging on 40GB of Text in Four Hours

3 August 2018

Robert M. Kirby

Nikolai Yakovenko

Bryan Catanzaro

ArXiv (abs)PDF HTML

Papers citing "Large Scale Language Modeling: Converging on 40GB of Text in Four Hours"

16 / 16 papers shown

Title
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism Tim Tsz-Kit Lau Weijian Li Chenwei Xu Han Liu Mladen Kolar 464 0 0 30 Dec 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods Tim Tsz-Kit Lau Weijian Li Chenwei Xu Han Liu Mladen Kolar 84 1 0 20 Jun 2024
Beyond Human-Level Accuracy: Computational Challenges in Deep Learning Joel Hestness Newsha Ardalani G. Diamos 61 68 0 03 Sep 2019
Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection David Ifeoluwa Adelani H. Mai Fuming Fang H. Nguyen Junichi Yamagishi Isao Echizen DeLMO 114 122 0 22 Jul 2019
A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition Wei Zhang Xiaodong Cui Ulrich Finkler G. Saon Abdullah Kayi A. Buyuktosunoglu Brian Kingsbury David S. Kung M. Picheny 49 19 0 10 Jul 2019
An Evaluation of Transfer Learning for Classifying Sales Engagement Emails at Large Scale Yong Liu Pavel A. Dmitriev Yifei Huang Andrew Brooks Li Dong 47 4 0 19 Apr 2019
Distributed Deep Learning Strategies For Automatic Speech Recognition Wei Zhang Xiaodong Cui Ulrich Finkler Brian Kingsbury G. Saon David S. Kung M. Picheny 67 29 0 10 Apr 2019
Compressing Gradient Optimizers via Count-Sketches Ryan Spring Anastasios Kyrillidis Vijai Mohan Anshumali Shrivastava 58 36 0 01 Feb 2019
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration Sangkug Lym Esha Choukse Siavash Zangeneh W. Wen Sujay Sanghavi M. Erez CVBM 77 88 0 26 Jan 2019
An Empirical Model of Large-Batch Training Sam McCandlish Jared Kaplan Dario Amodei OpenAI Dota Team 76 280 0 14 Dec 2018
Practical Text Classification With Large Pre-Trained Language Models Neel Kant Raul Puri Nikolai Yakovenko Bryan Catanzaro VLM 59 68 0 04 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent Noah Golmant N. Vemuri Z. Yao Vladimir Feinberg A. Gholami Kai Rothauge Michael W. Mahoney Joseph E. Gonzalez 92 73 0 30 Nov 2018
Language Modeling at Scale Md. Mostofa Ali Patwary Milind Chabbi Heewoo Jun Jiaji Huang G. Diamos Kenneth Church ALM 36 5 0 23 Oct 2018
Large batch size training of neural networks with adversarial training and second-order information Z. Yao A. Gholami Daiyaan Arfeen Richard Liaw Joseph E. Gonzalez Kurt Keutzer Michael W. Mahoney ODL 96 42 0 02 Oct 2018
Scaling Neural Machine Translation Myle Ott Sergey Edunov David Grangier Michael Auli AIMat 194 616 0 01 Jun 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Tal Ben-Nun Torsten Hoefler GNN 75 709 0 26 Feb 2018

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.