ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.08335
  4. Cited By
Demystifying BERT: Implications for Accelerator Design

Demystifying BERT: Implications for Accelerator Design

14 April 2021
Suchita Pati
Shaizeen Aga
Nuwan Jayasena
Matthew D. Sinclair
    LLMAG
ArXiv (abs)PDFHTML

Papers citing "Demystifying BERT: Implications for Accelerator Design"

39 / 39 papers shown
Title
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient
  Descent
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent
Heesu Kim
Hanmin Park
Taehyun Kim
Kwanheum Cho
Eojin Lee
Soojung Ryu
Hyuk-Jae Lee
Kiyoung Choi
Jinho Lee
48
35
0
15 Feb 2021
SynCron: Efficient Synchronization Support for Near-Data-Processing
  Architectures
SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures
Christina Giannoula
Nandita Vijaykumar
Nikela Papadopoulou
Vasileios Karakostas
Ivan Fernandez
Juan Gómez Luna
Lois Orosa
N. Koziris
G. Goumas
O. Mutlu
150
85
0
19 Jan 2021
Exploring the limits of Concurrency in ML Training on Google TPUs
Exploring the limits of Concurrency in ML Training on Google TPUs
Sameer Kumar
James Bradbury
C. Young
Yu Emma Wang
Anselm Levskaya
...
Tao Wang
Tayo Oguntebi
Yazhou Zu
Yuanzhong Xu
Andy Swing
BDLAIMatMoELRM
43
27
0
07 Nov 2020
FPRaker: A Processing Element For Accelerating Neural Network Training
FPRaker: A Processing Element For Accelerating Neural Network Training
Omar Mohamed Awad
Mostafa Mahmoud
Isak Edo Vivancos
Ali Hadi Zadeh
Ciaran Bannon
Anand Jayarajan
Gennady Pekhimenko
Andreas Moshovos
55
15
0
15 Oct 2020
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network
  Training and Inference
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference
Mostafa Mahmoud
Isak Edo Vivancos
Ali Hadi Zadeh
Omar Mohamed Awad
Gennady Pekhimenko
Jorge Albericio
Andreas Moshovos
MoE
68
59
0
01 Sep 2020
SeqPoint: Identifying Representative Iterations of Sequence-based Neural
  Networks
SeqPoint: Identifying Representative Iterations of Sequence-based Neural Networks
Suchita Pati
Shaizeen Aga
Matthew D. Sinclair
Nuwan Jayasena
16
10
0
20 Jul 2020
Automatic Horizontal Fusion for GPU Kernels
Automatic Horizontal Fusion for GPU Kernels
Ao Li
Bojian Zheng
Gennady Pekhimenko
Fan Long
66
46
0
02 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
104
135
0
30 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
844
42,332
0
28 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy
  Efficient Inference
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
65
188
0
08 May 2020
FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN
  Model Training
FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training
Sangkug Lym
M. Erez
21
25
0
27 Apr 2020
A$^3$: Accelerating Attention Mechanisms in Neural Networks with
  Approximation
A3^33: Accelerating Attention Mechanisms in Neural Networks with Approximation
Tae Jun Ham
Sungjun Jung
Seonghak Kim
Young H. Oh
Yeonhong Park
...
Jung-Hun Park
Sanghee Lee
Kyoung Park
Jae W. Lee
D. Jeong
63
219
0
22 Feb 2020
DeepRecSys: A System for Optimizing End-To-End At-scale Neural
  Recommendation Inference
DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference
Udit Gupta
Samuel Hsia
V. Saraph
Xiaodong Wang
Brandon Reagen
Gu-Yeon Wei
Hsien-Hsin S. Lee
David Brooks
Carole-Jean Wu
GNN
73
189
0
08 Jan 2020
MLPerf Inference Benchmark
MLPerf Inference Benchmark
Vijayarāghava Reḍḍī
C. Cheng
David Kanter
Pete H Mattson
Guenther Schmuelling
...
Bing Yu
George Y. Yuan
Aaron Zhong
P. Zhang
Yuchen Zhou
99
508
0
06 Nov 2019
MLPerf Training Benchmark
MLPerf Training Benchmark
Arya D. McCarthy
Christine Cheng
Cody Coleman
Greg Diamos
Paulius Micikevicius
...
Carole-Jean Wu
Lingjie Xu
Masafumi Yamazaki
C. Young
Matei A. Zaharia
103
315
0
02 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSLAIMat
371
6,463
0
26 Sep 2019
Scale MLPerf-0.6 models on Google TPU-v3 Pods
Scale MLPerf-0.6 models on Google TPU-v3 Pods
Sameer Kumar
Victor Bitorff
Dehao Chen
Chi-Heng Chou
Blake A. Hechtman
...
Peter Mattson
Shibo Wang
Tao Wang
Yuanzhong Xu
Zongwei Zhou
37
39
0
21 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
334
1,914
0
17 Sep 2019
Demystifying the MLPerf Benchmark Suite
Demystifying the MLPerf Benchmark Suite
Snehil Verma
Qinzhe Wu
Bagus Hanindhito
Gunjan Jha
E. John
R. Radhakrishnan
L. John
VLM
46
8
0
24 Aug 2019
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
Yu Sun
Shuohuan Wang
Yukun Li
Shikun Feng
Hao Tian
Hua Wu
Haifeng Wang
CLL
87
811
0
29 Jul 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
674
24,528
0
26 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
234
8,444
0
19 Jun 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
257
999
0
01 Apr 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
253
3,745
0
09 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,114
0
11 Oct 2018
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN
  Training
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training
Bojian Zheng
Abhishek Tiwari
Nandita Vijaykumar
Gennady Pekhimenko
48
44
0
22 May 2018
TBD: Benchmarking and Analyzing Deep Neural Network Training
TBD: Benchmarking and Analyzing Deep Neural Network Training
Hongyu Zhu
Mohamed Akrout
Bojian Zheng
Andrew Pelegris
Amar Phanishayee
Bianca Schroeder
Gennady Pekhimenko
53
80
0
16 Mar 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
730
132,199
0
12 Jun 2017
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
A. Parashar
Minsoo Rhu
Anurag Mukkara
A. Puglielli
Rangharajan Venkatesan
Brucek Khailany
J. Emer
S. Keckler
W. Dally
78
1,127
0
23 May 2017
In-Datacenter Performance Analysis of a Tensor Processing Unit
In-Datacenter Performance Analysis of a Tensor Processing Unit
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
...
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
235
4,638
0
16 Apr 2017
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
Sebastian Ruder
ODL
206
6,202
0
15 Sep 2016
Fathom: Reference Workloads for Modern Deep Learning Methods
Fathom: Reference Workloads for Modern Deep Learning Methods
Robert Adolf
Saketh Rama
Brandon Reagen
Gu-Yeon Wei
David Brooks
48
183
0
23 Aug 2016
Gaussian Error Linear Units (GELUs)
Gaussian Error Linear Units (GELUs)
Dan Hendrycks
Kevin Gimpel
172
5,037
0
27 Jun 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
312
8,169
0
16 Jun 2016
Optimizing Performance of Recurrent Neural Networks on GPUs
Optimizing Performance of Recurrent Neural Networks on GPUs
J. Appleyard
Tomás Kociský
Phil Blunsom
58
92
0
07 Apr 2016
EIE: Efficient Inference Engine on Compressed Deep Neural Network
EIE: Efficient Inference Engine on Compressed Deep Neural Network
Song Han
Xingyu Liu
Huizi Mao
Jing Pu
A. Pedram
M. Horowitz
W. Dally
127
2,459
0
04 Feb 2016
How transferable are features in deep neural networks?
How transferable are features in deep neural networks?
J. Yosinski
Jeff Clune
Yoshua Bengio
Hod Lipson
OOD
234
8,344
0
06 Nov 2014
CNN Features off-the-shelf: an Astounding Baseline for Recognition
CNN Features off-the-shelf: an Astounding Baseline for Recognition
A. Razavian
Hossein Azizpour
Josephine Sullivan
S. Carlsson
159
4,945
0
23 Mar 2014
Optimizing CUDA Code By Kernel Fusion---Application on BLAS
Optimizing CUDA Code By Kernel Fusion---Application on BLAS
Jiri Filipovic
M. Madzin
J. Fousek
L. Matyska
91
79
0
06 May 2013
1