ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.01107
  4. Cited By
A Survey on Efficient Training of Transformers

A Survey on Efficient Training of Transformers

2 February 2023
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
ArXivPDFHTML

Papers citing "A Survey on Efficient Training of Transformers"

34 / 84 papers shown
Title
Optimal checkpointing for heterogeneous chains: how to train deep neural
  networks with limited memory
Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory
Julien Herrmann
Olivier Beaumont
Lionel Eyraud-Dubois
J. Herrmann
Alexis Joly
Alena Shilova
BDL
60
29
0
27 Nov 2019
Rigging the Lottery: Making All Tickets Winners
Rigging the Lottery: Making All Tickets Winners
Utku Evci
Trevor Gale
Jacob Menick
Pablo Samuel Castro
Erich Elsen
156
592
0
25 Nov 2019
On the Relationship between Self-Attention and Convolutional Layers
On the Relationship between Self-Attention and Convolutional Layers
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
99
530
0
08 Nov 2019
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Samyam Rajbhandari
Jeff Rasley
Olatunji Ruwase
Yuxiong He
ALM
AI4CE
77
852
0
04 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
294
6,420
0
26 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
303
1,861
0
17 Sep 2019
Green AI
Green AI
Roy Schwartz
Jesse Dodge
Noah A. Smith
Oren Etzioni
97
1,124
0
22 Jul 2019
Energy and Policy Considerations for Deep Learning in NLP
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell
Ananya Ganesh
Andrew McCallum
60
2,633
0
05 Jun 2019
MASS: Masked Sequence to Sequence Pre-training for Language Generation
MASS: Masked Sequence to Sequence Pre-training for Language Generation
Kaitao Song
Xu Tan
Tao Qin
Jianfeng Lu
Tie-Yan Liu
99
962
0
07 May 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
188
991
0
01 Apr 2019
Parameter-Efficient Transfer Learning for NLP
Parameter-Efficient Transfer Learning for NLP
N. Houlsby
A. Giurgiu
Stanislaw Jastrzebski
Bruna Morrone
Quentin de Laroussilhe
Andrea Gesmundo
Mona Attariyan
Sylvain Gelly
204
4,368
0
02 Feb 2019
The Evolved Transformer
The Evolved Transformer
David R. So
Chen Liang
Quoc V. Le
ViT
89
462
0
30 Jan 2019
Fixup Initialization: Residual Learning Without Normalization
Fixup Initialization: Residual Learning Without Normalization
Hongyi Zhang
Yann N. Dauphin
Tengyu Ma
ODL
AI4CE
82
348
0
27 Jan 2019
Training Deep Neural Networks with 8-bit Floating Point Numbers
Training Deep Neural Networks with 8-bit Floating Point Numbers
Naigang Wang
Jungwook Choi
D. Brand
Chia-Yu Chen
K. Gopalakrishnan
MQ
56
500
0
19 Dec 2018
Learning and Generalization in Overparameterized Neural Networks, Going
  Beyond Two Layers
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Zeyuan Allen-Zhu
Yuanzhi Li
Yingyu Liang
MLT
153
769
0
12 Nov 2018
A Convergence Theory for Deep Learning via Over-Parameterization
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
AI4CE
ODL
212
1,457
0
09 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.3K
93,936
0
11 Oct 2018
SNIP: Single-shot Network Pruning based on Connection Sensitivity
SNIP: Single-shot Network Pruning based on Connection Sensitivity
Namhoon Lee
Thalaiyasingam Ajanthan
Philip Torr
VLM
221
1,190
0
04 Oct 2018
Learning Overparameterized Neural Networks via Stochastic Gradient
  Descent on Structured Data
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Yuanzhi Li
Yingyu Liang
MLT
176
652
0
03 Aug 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
187
3,433
0
09 Mar 2018
Not All Samples Are Created Equal: Deep Learning with Importance
  Sampling
Not All Samples Are Created Equal: Deep Learning with Importance Sampling
Angelos Katharopoulos
François Fleuret
67
515
0
02 Mar 2018
On the Optimization of Deep Networks: Implicit Acceleration by
  Overparameterization
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
Sanjeev Arora
Nadav Cohen
Elad Hazan
91
481
0
19 Feb 2018
Mixed Precision Training of Convolutional Neural Networks using Integer
  Operations
Mixed Precision Training of Convolutional Neural Networks using Integer Operations
Dipankar Das
Naveen Mellempudi
Dheevatsa Mudigere
Dhiraj D. Kalamkar
Sasikanth Avancha
...
J. Corbal
N. Shustrov
R. Dubtsov
Evarist Fomenko
V. Pirogov
MQ
61
154
0
03 Feb 2018
Mixed Precision Training
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
143
1,779
0
10 Oct 2017
Large Batch Training of Convolutional Networks
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
116
844
0
13 Aug 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
526
129,831
0
12 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
107
3,666
0
08 Jun 2017
Convergence Analysis of Proximal Gradient with Momentum for Nonconvex
  Optimization
Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization
Qunwei Li
Yi Zhou
Yingbin Liang
P. Varshney
101
94
0
14 May 2017
In-Datacenter Performance Analysis of a Tensor Processing Unit
In-Datacenter Performance Analysis of a Tensor Processing Unit
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
...
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
201
4,619
0
16 Apr 2017
Quantized Neural Networks: Training Neural Networks with Low Precision
  Weights and Activations
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
Itay Hubara
Matthieu Courbariaux
Daniel Soudry
Ran El-Yaniv
Yoshua Bengio
MQ
126
1,859
0
22 Sep 2016
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low
  Bitwidth Gradients
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Shuchang Zhou
Yuxin Wu
Zekun Ni
Xinyu Zhou
He Wen
Yuheng Zou
MQ
107
2,080
0
20 Jun 2016
Optimization Methods for Large-Scale Machine Learning
Optimization Methods for Large-Scale Machine Learning
Léon Bottou
Frank E. Curtis
J. Nocedal
195
3,198
0
15 Jun 2016
Training Deep Nets with Sublinear Memory Cost
Training Deep Nets with Sublinear Memory Cost
Tianqi Chen
Bing Xu
Chiyuan Zhang
Carlos Guestrin
95
1,156
0
21 Apr 2016
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.2K
149,474
0
22 Dec 2014
Previous
12