ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.14017
  4. Cited By
Full Stack Optimization of Transformer Inference: a Survey

Full Stack Optimization of Transformer Inference: a Survey

27 February 2023
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
Hasan Genç
Grace Dinh
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
    MQ
ArXivPDFHTML

Papers citing "Full Stack Optimization of Transformer Inference: a Survey"

43 / 143 papers shown
Title
Interstellar: Using Halide's Scheduling Language to Analyze DNN
  Accelerators
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
Xuan S. Yang
Mingyu Gao
Qiaoyi Liu
Jeff Setter
Jing Pu
...
Kaidi Cao
Heonjae Ha
Priyanka Raina
Christos Kozyrakis
M. Horowitz
155
228
0
10 Sep 2018
Neural Architecture Optimization
Neural Architecture Optimization
Renqian Luo
Fei Tian
Tao Qin
Enhong Chen
Tie-Yan Liu
3DV
77
654
0
22 Aug 2018
MnasNet: Platform-Aware Neural Architecture Search for Mobile
MnasNet: Platform-Aware Neural Architecture Search for Mobile
Mingxing Tan
Bo Chen
Ruoming Pang
Vijay Vasudevan
Mark Sandler
Andrew G. Howard
Quoc V. Le
MQ
120
3,010
0
31 Jul 2018
Beyond Data and Model Parallelism for Deep Neural Networks
Beyond Data and Model Parallelism for Deep Neural Networks
Zhihao Jia
Matei A. Zaharia
A. Aiken
GNN
AI4CE
59
504
0
14 Jul 2018
DARTS: Differentiable Architecture Search
DARTS: Differentiable Architecture Search
Hanxiao Liu
Karen Simonyan
Yiming Yang
199
4,355
0
24 Jun 2018
DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural
  Architectures
DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures
Jin-Dong Dong
A. Cheng
Da-Cheng Juan
Wei Wei
Min Sun
64
181
0
21 Jun 2018
Online normalizer calculation for softmax
Online normalizer calculation for softmax
Maxim Milakov
N. Gimelshein
74
91
0
08 May 2018
Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code
Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code
Riyadh Baghdadi
Jessica Ray
Malek Ben Romdhane
Emanuele Del Sozzo
Abdurrahman Akkas
Yunming Zhang
Patricia Suriana
Shoaib Kamil
Saman P. Amarasinghe
47
259
0
27 Apr 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,159
0
20 Apr 2018
NetAdapt: Platform-Aware Neural Network Adaptation for Mobile
  Applications
NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications
Tien-Ju Yang
Andrew G. Howard
Bo Chen
Xiao Zhang
Alec Go
Mark Sandler
Vivienne Sze
Hartwig Adam
133
521
0
09 Apr 2018
An Approach for Finding Permutations Quickly: Fusion and Dimension
  matching
An Approach for Finding Permutations Quickly: Fusion and Dimension matching
Aravind Acharya
Uday Bondhugula
Albert Cohen
18
3
0
28 Mar 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
230
3,473
0
09 Mar 2018
Tensor Comprehensions: Framework-Agnostic High-Performance Machine
  Learning Abstractions
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
Nicolas Vasilache
O. Zinenko
Theodoros Theodoridis
Priya Goyal
Zach DeVito
William S. Moses
Sven Verdoolaege
Andrew Adams
Albert Cohen
71
436
0
13 Feb 2018
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
Hieu H. Pham
M. Guan
Barret Zoph
Quoc V. Le
J. Dean
110
2,763
0
09 Feb 2018
Regularized Evolution for Image Classifier Architecture Search
Regularized Evolution for Image Classifier Architecture Search
Esteban Real
A. Aggarwal
Yanping Huang
Quoc V. Le
160
3,031
0
05 Feb 2018
MobileNetV2: Inverted Residuals and Linear Bottlenecks
MobileNetV2: Inverted Residuals and Linear Bottlenecks
Mark Sandler
Andrew G. Howard
Menglong Zhu
A. Zhmoginov
Liang-Chieh Chen
181
19,284
0
13 Jan 2018
Lectures on Randomized Numerical Linear Algebra
Lectures on Randomized Numerical Linear Algebra
P. Drineas
Michael W. Mahoney
44
76
0
24 Dec 2017
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
150
3,130
0
15 Dec 2017
Progressive Neural Architecture Search
Progressive Neural Architecture Search
Chenxi Liu
Barret Zoph
Maxim Neumann
Jonathon Shlens
Wei Hua
Li Li
Li Fei-Fei
Alan Yuille
Jonathan Huang
Kevin Patrick Murphy
109
1,991
0
02 Dec 2017
Hierarchical Representations for Efficient Architecture Search
Hierarchical Representations for Efficient Architecture Search
Hanxiao Liu
Karen Simonyan
Oriol Vinyals
Chrisantha Fernando
Koray Kavukcuoglu
3DV
87
928
0
01 Nov 2017
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and
  Cross-lingual Focused Evaluation
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation
Daniel Cer
Mona T. Diab
Eneko Agirre
I. Lopez-Gazpio
Lucia Specia
430
1,882
0
31 Jul 2017
Learning Transferable Architectures for Scalable Image Recognition
Learning Transferable Architectures for Scalable Image Recognition
Barret Zoph
Vijay Vasudevan
Jonathon Shlens
Quoc V. Le
174
5,603
0
21 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
701
131,652
0
12 Jun 2017
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
A. Parashar
Minsoo Rhu
Anurag Mukkara
A. Puglielli
Rangharajan Venkatesan
Brucek Khailany
J. Emer
S. Keckler
W. Dally
75
1,126
0
23 May 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through
  Inference
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
520
4,479
0
18 Apr 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
  Applications
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
1.1K
20,837
0
17 Apr 2017
In-Datacenter Performance Analysis of a Tensor Processing Unit
In-Datacenter Performance Analysis of a Tensor Processing Unit
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
...
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
235
4,635
0
16 Apr 2017
Efficient Processing of Deep Neural Networks: A Tutorial and Survey
Efficient Processing of Deep Neural Networks: A Tutorial and Survey
Vivienne Sze
Yu-hsin Chen
Tien-Ju Yang
J. Emer
AAML
3DV
120
3,022
0
27 Mar 2017
Designing Neural Network Architectures using Reinforcement Learning
Designing Neural Network Architectures using Reinforcement Learning
Bowen Baker
O. Gupta
Nikhil Naik
Ramesh Raskar
113
1,471
0
07 Nov 2016
Neural Architecture Search with Reinforcement Learning
Neural Architecture Search with Reinforcement Learning
Barret Zoph
Quoc V. Le
462
5,372
0
05 Nov 2016
Pointer Sentinel Mixture Models
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
319
2,859
0
26 Sep 2016
Gaussian Error Linear Units (GELUs)
Gaussian Error Linear Units (GELUs)
Dan Hendrycks
Kevin Gimpel
169
5,000
0
27 Jun 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
283
8,134
0
16 Jun 2016
TensorFlow: A system for large-scale machine learning
TensorFlow: A system for large-scale machine learning
Martín Abadi
P. Barham
Jianmin Chen
Zhiwen Chen
Andy Davis
...
Vijay Vasudevan
Pete Warden
Martin Wicke
Yuan Yu
Xiaoqiang Zhang
GNN
AI4CE
433
18,361
0
27 May 2016
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB
  model size
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
F. Iandola
Song Han
Matthew W. Moskewicz
Khalid Ashraf
W. Dally
Kurt Keutzer
150
7,486
0
24 Feb 2016
EIE: Efficient Inference Engine on Compressed Deep Neural Network
EIE: Efficient Inference Engine on Compressed Deep Neural Network
Song Han
Xingyu Liu
Huizi Mao
Jing Pu
A. Pedram
M. Horowitz
W. Dally
121
2,457
0
04 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,020
0
10 Dec 2015
MXNet: A Flexible and Efficient Machine Learning Library for
  Heterogeneous Distributed Systems
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
Tianqi Chen
Mu Li
Yutian Li
Min Lin
Naiyan Wang
Minjie Wang
Tianjun Xiao
Bing Xu
Chiyuan Zhang
Zheng Zhang
196
2,247
0
03 Dec 2015
cuDNN: Efficient Primitives for Deep Learning
cuDNN: Efficient Primitives for Deep Learning
Sharan Chetlur
Cliff Woolley
Philippe Vandermersch
Jonathan M. Cohen
J. Tran
Bryan Catanzaro
Evan Shelhamer
133
1,848
0
03 Oct 2014
Going Deeper with Convolutions
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
465
43,658
0
17 Sep 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.6K
100,386
0
04 Sep 2014
Caffe: Convolutional Architecture for Fast Feature Embedding
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia
Evan Shelhamer
Jeff Donahue
Sergey Karayev
Jonathan Long
Ross B. Girshick
S. Guadarrama
Trevor Darrell
VLM
BDL
3DV
274
14,711
0
20 Jun 2014
Practical Bayesian Optimization of Machine Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms
Jasper Snoek
Hugo Larochelle
Ryan P. Adams
353
7,942
0
13 Jun 2012
Previous
123