Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.14017
Cited By
Full Stack Optimization of Transformer Inference: a Survey
27 February 2023
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
Hasan Genç
Grace Dinh
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Full Stack Optimization of Transformer Inference: a Survey"
50 / 143 papers shown
Title
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
384
6,768
0
23 Dec 2020
Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead
Maurizio Capra
Beatrice Bussolino
Alberto Marchisio
Guido Masera
Maurizio Martina
Mohamed Bennai
BDL
84
145
0
21 Dec 2020
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Hanrui Wang
Zhekai Zhang
Song Han
111
390
0
17 Dec 2020
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference
Thierry Tambe
Coleman Hooper
Lillian Pentecost
Tianyu Jia
En-Yu Yang
...
Victor Sanh
P. Whatmough
Alexander M. Rush
David Brooks
Gu-Yeon Wei
47
123
0
28 Nov 2020
HAWQV3: Dyadic Neural Network Quantization
Z. Yao
Zhen Dong
Zhangcheng Zheng
A. Gholami
Jiali Yu
...
Leyuan Wang
Qijing Huang
Yida Wang
Michael W. Mahoney
Kurt Keutzer
MQ
82
86
0
20 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
651
41,103
0
22 Oct 2020
Transferable Graph Optimizers for ML Compilers
Yanqi Zhou
Sudip Roy
AmirAli Abdolrashidi
Daniel Wong
Peter C. Ma
...
Mangpo Phitchaya Phothilimtha
Shen Wang
Anna Goldie
Azalia Mirhoseini
James Laudon
GNN
53
55
0
21 Oct 2020
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search
Gyuwan Kim
Kyunghyun Cho
74
96
0
14 Oct 2020
Sparse Quantized Spectral Clustering
Zhenyu Liao
Romain Couillet
Michael W. Mahoney
MQ
56
16
0
03 Oct 2020
ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
Sheng-Chun Kao
Geonhwa Jeong
T. Krishna
83
96
0
04 Sep 2020
MCUNet: Tiny Deep Learning on IoT Devices
Ji Lin
Wei-Ming Chen
Chengyue Wu
J. Cohn
Chuang Gan
Song Han
147
488
0
20 Jul 2020
FTRANS: Energy-Efficient Acceleration of Transformers using FPGA
Bingbing Li
Santosh Pandey
Haowen Fang
Yanjun Lyv
Ji Li
Jieyang Chen
Mimi Xie
Lipeng Wan
Hang Liu
Caiwen Ding
AI4CE
49
179
0
16 Jul 2020
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights
Shail Dave
Riyadh Baghdadi
Tony Nowatzki
Sasikanth Avancha
Aviral Shrivastava
Baoxin Li
99
82
0
02 Jul 2020
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
F. Iandola
Albert Eaton Shaw
Ravi Krishna
Kurt Keutzer
VLM
65
127
0
19 Jun 2020
Ansor: Generating High-Performance Tensor Programs for Deep Learning
Lianmin Zheng
Chengfan Jia
Minmin Sun
Zhao Wu
Cody Hao Yu
...
Jun Yang
Danyang Zhuo
Koushik Sen
Joseph E. Gonzalez
Ion Stoica
130
399
0
11 Jun 2020
A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions
Pengzhen Ren
Yun Xiao
Xiaojun Chang
Po-Yao (Bernie) Huang
Zhihui Li
Xiaojiang Chen
Xin Wang
AI4CE
110
674
0
01 Jun 2020
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Hanrui Wang
Zhanghao Wu
Zhijian Liu
Han Cai
Ligeng Zhu
Chuang Gan
Song Han
88
262
0
28 May 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
762
42,055
0
28 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
223
3,139
0
16 May 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
66
486
0
15 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
60
188
0
08 May 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
Ji Xin
Raphael Tang
Jaejun Lee
Yaoliang Yu
Jimmy J. Lin
61
374
0
27 Apr 2020
Lite Transformer with Long-Short Range Attention
Zhanghao Wu
Zhijian Liu
Ji Lin
Chengyue Wu
Song Han
60
322
0
24 Apr 2020
FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions
Alvin Wan
Xiaoliang Dai
Peizhao Zhang
Zijian He
Yuandong Tian
...
Matthew Yu
Tao Xu
Kan Chen
Peter Vajda
Joseph E. Gonzalez
67
291
0
12 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
79
322
0
08 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
109
816
0
06 Apr 2020
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Jiahui Yu
Pengchong Jin
Hanxiao Liu
Gabriel Bender
Pieter-Jan Kindermans
Mingxing Tan
Thomas Huang
Xiaodan Song
Ruoming Pang
Quoc V. Le
77
304
0
24 Mar 2020
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
Chaoyue Liu
Libin Zhu
M. Belkin
ODL
73
262
0
29 Feb 2020
A
3
^3
3
: Accelerating Attention Mechanisms in Neural Networks with Approximation
Tae Jun Ham
Sungjun Jung
Seonghak Kim
Young H. Oh
Yeonhong Park
...
Jung-Hun Park
Sanghee Lee
Kyoung Park
Jae W. Lee
D. Jeong
60
218
0
22 Feb 2020
PyHessian: Neural Networks Through the Lens of the Hessian
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
ODL
55
303
0
16 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
493
42,449
0
03 Dec 2019
Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration
Hasan Genç
Seah Kim
Alon Amid
Ameer Haj-Ali
Vighnesh Iyer
...
Ion Stoica
Jonathan Ragan-Kelley
Krste Asanović
B. Nikolić
Y. Shao
54
228
0
22 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
69
648
0
13 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
424
20,181
0
23 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
366
6,455
0
26 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
117
593
0
25 Sep 2019
Once-for-All: Train One Network and Specialize it for Efficient Deployment
Han Cai
Chuang Gan
Tianzhe Wang
Zhekai Zhang
Song Han
OOD
102
1,281
0
26 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
653
24,464
0
26 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
230
8,433
0
19 Jun 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DV
MedIm
139
18,134
0
28 May 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
100
1,061
0
25 May 2019
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
Zhen Dong
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
80
526
0
29 Apr 2019
Single Path One-Shot Neural Architecture Search with Uniform Sampling
Zichao Guo
Xiangyu Zhang
Haoyuan Mu
Wen Heng
Zechun Liu
Yichen Wei
Jian Sun
80
938
0
31 Mar 2019
The State of Sparsity in Deep Neural Networks
Trevor Gale
Erich Elsen
Sara Hooker
161
758
0
25 Feb 2019
The Evolved Transformer
David R. So
Chen Liang
Quoc V. Le
ViT
107
462
0
30 Jan 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
241
3,728
0
09 Jan 2019
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search
Bichen Wu
Xiaoliang Dai
Peizhao Zhang
Yanghan Wang
Fei Sun
Yiming Wu
Yuandong Tian
Peter Vajda
Yangqing Jia
Kurt Keutzer
MQ
100
1,303
0
09 Dec 2018
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Han Cai
Ligeng Zhu
Song Han
99
1,867
0
02 Dec 2018
Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search
Bichen Wu
Yanghan Wang
Peizhao Zhang
Yuandong Tian
Peter Vajda
Kurt Keutzer
MQ
66
273
0
30 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
Previous
1
2
3
Next