ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.00962
  4. Cited By
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
v1v2v3v4v5 (latest)

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

1 April 2019
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
    ODL
ArXiv (abs)PDFHTMLGithub (1698★)

Papers citing "Large Batch Optimization for Deep Learning: Training BERT in 76 minutes"

50 / 611 papers shown
Title
The birth of Romanian BERT
The birth of Romanian BERT
Stefan Daniel Dumitrescu
Andrei-Marius Avram
S. Pyysalo
VLM
63
78
0
18 Sep 2020
Pay Attention when Required
Pay Attention when Required
Swetha Mandava
Szymon Migacz
A. Fit-Florea
96
11
0
09 Sep 2020
UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a
  Multi-Task Learning Architecture for Memotion Analysis
UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a Multi-Task Learning Architecture for Memotion Analysis
G. Vlad
George-Eduard Zaharia
Dumitru-Clementin Cercel
Costin-Gabriel Chiru
Stefan Trausan-Matu
76
31
0
06 Sep 2020
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing
  Benchmark
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark
Haoran Li
Abhinav Arora
Shuohui Chen
Anchit Gupta
Sonal Gupta
Yashar Mehdad
131
179
0
21 Aug 2020
A Computational-Graph Partitioning Method for Training
  Memory-Constrained DNNs
A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs
Fareed Qararyah
Mohamed Wahib
Douga Dikbayir
M. E. Belviranli
Didem Unat
71
10
0
19 Aug 2020
Finding Fast Transformers: One-Shot Neural Architecture Search by
  Component Composition
Finding Fast Transformers: One-Shot Neural Architecture Search by Component Composition
Henry Tsai
Jayden Ooi
Chun-Sung Ferng
Hyung Won Chung
Jason Riesa
ViT
80
21
0
15 Aug 2020
A community-powered search of machine learning strategy space to find
  NMR property prediction models
A community-powered search of machine learning strategy space to find NMR property prediction models
Lars A. Bratholm
W. Gerrard
Brandon M. Anderson
Shaojie Bai
Sunghwan Choi
...
A. Torrubia
Devin Willmott
C. Butts
David R. Glowacki
Kaggle participants
50
17
0
13 Aug 2020
Variance-reduced Language Pretraining via a Mask Proposal Network
Variance-reduced Language Pretraining via a Mask Proposal Network
Liang Chen
SSL
58
8
0
12 Aug 2020
Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Sai Praneeth Karimireddy
Martin Jaggi
Satyen Kale
M. Mohri
Sashank J. Reddi
Sebastian U. Stich
A. Suresh
FedML
179
219
0
08 Aug 2020
Pretraining Techniques for Sequence-to-Sequence Voice Conversion
Pretraining Techniques for Sequence-to-Sequence Voice Conversion
Wen-Chin Huang
Tomoki Hayashi
Yi-Chiao Wu
Hirokazu Kameoka
Tomoki Toda
127
40
0
07 Aug 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch
  Training
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
87
10
0
28 Jul 2020
CSER: Communication-efficient SGD with Error Reset
CSER: Communication-efficient SGD with Error Reset
Cong Xie
Shuai Zheng
Oluwasanmi Koyejo
Indranil Gupta
Mu Li
Yanghua Peng
110
40
0
26 Jul 2020
ProtTrans: Towards Cracking the Language of Life's Code Through
  Self-Supervised Deep Learning and High Performance Computing
ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing
Ahmed Elnaggar
M. Heinzinger
Christian Dallago
Ghalia Rehawi
Yu Wang
...
Tamas B. Fehér
Christoph Angerer
Martin Steinegger
D. Bhowmik
B. Rost
DRL
95
970
0
13 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
90
37
0
09 Jul 2020
Descending through a Crowded Valley - Benchmarking Deep Learning
  Optimizers
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
251
169
0
03 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
149
135
0
30 Jun 2020
Learning compositional functions via multiplicative weight updates
Learning compositional functions via multiplicative weight updates
Jeremy Bernstein
Jiawei Zhao
M. Meister
Xuan Li
Anima Anandkumar
Yisong Yue
83
27
0
25 Jun 2020
DeepTopPush: Simple and Scalable Method for Accuracy at the Top
DeepTopPush: Simple and Scalable Method for Accuracy at the Top
V. Mácha
Lukáš Adam
Václav Smídl
65
2
0
22 Jun 2020
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of
  Gradients
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients
Chenfei Zhu
Yu Cheng
Zhe Gan
Furong Huang
Jingjing Liu
Tom Goldstein
ODL
121
2
0
21 Jun 2020
SqueezeBERT: What can computer vision teach NLP about efficient neural
  networks?
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
F. Iandola
Albert Eaton Shaw
Ravi Krishna
Kurt Keutzer
VLM
95
128
0
19 Jun 2020
The Limit of the Batch Size
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
121
15
0
15 Jun 2020
FastPitch: Parallel Text-to-speech with Pitch Prediction
FastPitch: Parallel Text-to-speech with Pitch Prediction
Adrian Lañcucki
123
342
0
11 Jun 2020
MC-BERT: Efficient Language Pre-Training via a Meta Controller
MC-BERT: Efficient Language Pre-Training via a Meta Controller
Zhenhui Xu
Linyuan Gong
Guolin Ke
Di He
Shuxin Zheng
Liwei Wang
Jiang Bian
Tie-Yan Liu
BDL
65
18
0
10 Jun 2020
Extrapolation for Large-batch Training in Deep Learning
Extrapolation for Large-batch Training in Deep Learning
Tao R. Lin
Lingjing Kong
Sebastian U. Stich
Martin Jaggi
103
36
0
10 Jun 2020
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
354
3,032
0
09 Jun 2020
Input-independent Attention Weights Are Expressive Enough: A Study of
  Attention in Self-supervised Audio Transformers
Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers
Tsung-Han Wu
Chun-Chen Hsieh
Yen-Hao Chen
Po-Han Chi
Hung-yi Lee
53
1
0
09 Jun 2020
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and
  Strong Baselines
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Marius Mosbach
Maksym Andriushchenko
Dietrich Klakow
191
363
0
08 Jun 2020
Scaling Distributed Training with Adaptive Summation
Scaling Distributed Training with Adaptive Summation
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Olli Saarikivi
Tianju Xu
Vadim Eksarevskiy
Jaliya Ekanayake
Emad Barsoum
32
9
0
04 Jun 2020
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Z. Yao
A. Gholami
Sheng Shen
Mustafa Mustafa
Kurt Keutzer
Michael W. Mahoney
ODL
172
287
0
01 Jun 2020
Training Keyword Spotting Models on Non-IID Data with Federated Learning
Training Keyword Spotting Models on Non-IID Data with Federated Learning
Andrew Straiton Hard
Kurt Partridge
Cameron Nguyen
Niranjan A. Subrahmanya
Aishanee Shah
Pai Zhu
Ignacio López Moreno
Rajiv Mathews
OODFedML
76
67
0
21 May 2020
Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio
  Representation
Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation
Po-Han Chi
Pei-Hung Chung
Tsung-Han Wu
Chun-Cheng Hsieh
Yen-Hao Chen
Shang-Wen Li
Hung-yi Lee
SSL
99
148
0
18 May 2020
Adaptive Transformers for Learning Multimodal Representations
Adaptive Transformers for Learning Multimodal Representations
Prajjwal Bhargava
21
4
0
15 May 2020
Cross-lingual Transfer of Sentiment Classifiers
Cross-lingual Transfer of Sentiment Classifiers
Marko Robnik-Šikonja
Kristjan Reba
I. Mozetič
24
6
0
15 May 2020
Pre-training technique to localize medical BERT and enhance biomedical
  BERT
Pre-training technique to localize medical BERT and enhance biomedical BERT
Shoya Wada
Toshihiro Takeda
S. Manabe
Shozo Konishi
Jun Kamohara
Y. Matsumura
LM&MA
64
12
0
14 May 2020
DeepRx: Fully Convolutional Deep Learning Receiver
DeepRx: Fully Convolutional Deep Learning Receiver
Mikko Honkala
D. Korpi
Janne M. J. Huttunen
147
139
0
04 May 2020
Probabilistically Masked Language Model Capable of Autoregressive
  Generation in Arbitrary Word Order
Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order
Yi-Lun Liao
Xin Jiang
Qun Liu
56
40
0
24 Apr 2020
Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms
Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms
Yujing Ma
Florin Rusu
38
3
0
19 Apr 2020
ETC: Encoding Long and Structured Inputs in Transformers
ETC: Encoding Long and Structured Inputs in Transformers
Joshua Ainslie
Santiago Ontanon
Chris Alberti
Vaclav Cvicek
Zachary Kenneth Fisher
Philip Pham
Anirudh Ravula
Sumit Sanghai
Qifan Wang
Li Yang
98
55
0
17 Apr 2020
Analyzing Redundancy in Pretrained Transformer Models
Analyzing Redundancy in Pretrained Transformer Models
Fahim Dalvi
Hassan Sajjad
Nadir Durrani
Yonatan Belinkov
37
2
0
08 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
140
820
0
06 Apr 2020
Solving Raven's Progressive Matrices with Multi-Layer Relation Networks
Solving Raven's Progressive Matrices with Multi-Layer Relation Networks
Marius Jahrens
T. Martinetz
AIMatGNN
65
29
0
25 Mar 2020
Communication-Efficient Distributed Deep Learning: A Comprehensive
  Survey
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey
Zhenheng Tang
Shaoshuai Shi
Wei Wang
Yue Liu
Xiaowen Chu
83
49
0
10 Mar 2020
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate
  Schedule
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
93
29
0
09 Mar 2020
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
130
12
0
06 Mar 2020
Benchmark Performance of Machine And Deep Learning Based Methodologies
  for Urdu Text Document Classification
Benchmark Performance of Machine And Deep Learning Based Methodologies for Urdu Text Document Classification
Muhammad Nabeel Asim
M. Ghani
Muhammad Ali Ibrahim
Sheraz Ahmed
Waqar Mahmood
Andreas Dengel
57
19
0
03 Mar 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
170
1,511
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training
  and Inference of Transformers
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
141
151
0
26 Feb 2020
FeatureNMS: Non-Maximum Suppression by Learning Feature Embeddings
FeatureNMS: Non-Maximum Suppression by Learning Feature Embeddings
Niels Ole Salscheider
66
38
0
18 Feb 2020
Training Large Neural Networks with Constant Memory using a New
  Execution Algorithm
Training Large Neural Networks with Constant Memory using a New Execution Algorithm
B. Pudipeddi
Maral Mesmakhosroshahi
Jinwen Xi
S. Bharadwaj
111
58
0
13 Feb 2020
CBAG: Conditional Biomedical Abstract Generation
CBAG: Conditional Biomedical Abstract Generation
Justin Sybrandt
Ilya Safro
MedImAI4CE
53
8
0
13 Feb 2020
Previous
123...111213
Next