Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.10030
Cited By
Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization
19 October 2021
Panjie Qi
E. Sha
Qingfeng Zhuge
Hongwu Peng
Shaoyi Huang
Zhenglun Kong
Yuhong Song
Bingbing Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization"
24 / 24 papers shown
Title
Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores
Orestis Zachariadis
Nitin Satpute
Juan Gómez Luna
J. Olivares
52
61
0
29 Sep 2020
Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning
Bingbing Li
Zhenglun Kong
Tianyun Zhang
Ji Li
Zechao Li
Hang Liu
Caiwen Ding
VLM
136
64
0
17 Sep 2020
Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
Cong Guo
B. Hsueh
Jingwen Leng
Yuxian Qiu
Yue Guan
Zehuan Wang
Xiaoying Jia
Xipeng Li
Minyi Guo
Yuhao Zhu
65
83
0
29 Aug 2020
FTRANS: Energy-Efficient Acceleration of Transformers using FPGA
Bingbing Li
Santosh Pandey
Haowen Fang
Yanjun Lyv
Ji Li
Jieyang Chen
Mimi Xie
Lipeng Wan
Hang Liu
Caiwen Ding
AI4CE
49
178
0
16 Jul 2020
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Hanrui Wang
Zhanghao Wu
Zhijian Liu
Han Cai
Ligeng Zhu
Chuang Gan
Song Han
88
262
0
28 May 2020
CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks
Runbin Shi
Peiyan Dong
Tong Geng
Yuhao Ding
Xiaolong Ma
Hayden Kwok-Hay So
Martin C. Herbordt
Ang Li
Yanzhi Wang
MQ
72
13
0
11 May 2020
When BERT Plays the Lottery, All Tickets Are Winning
Sai Prasanna
Anna Rogers
Anna Rumshisky
MILM
51
187
0
01 May 2020
A
3
^3
3
: Accelerating Attention Mechanisms in Neural Networks with Approximation
Tae Jun Ham
Sungjun Jung
Seonghak Kim
Young H. Oh
Yeonhong Park
...
Jung-Hun Park
Sanghee Lee
Kyoung Park
Jae W. Lee
D. Jeong
57
218
0
22 Feb 2020
Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks
Lei Yang
Zheyu Yan
Meng Li
Hyoukjun Kwon
Liangzhen Lai
T. Krishna
Vikas Chandra
Weiwen Jiang
Yiyu Shi
77
115
0
10 Feb 2020
Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing
En Li
Liekang Zeng
Zhi Zhou
Xu Chen
49
627
0
04 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
232
7,504
0
02 Oct 2019
PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices
Xiaolong Ma
Fu-Ming Guo
Wei Niu
Xue Lin
Jian Tang
Kaisheng Ma
Bin Ren
Yanzhi Wang
CVBM
49
176
0
06 Sep 2019
Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference
Weiwen Jiang
E. Sha
Xinyi Zhang
Lei Yang
Qingfeng Zhuge
Yiyu Shi
Jiaxi Hu
55
75
0
21 Jul 2019
Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search
Weiwen Jiang
Xinyi Zhang
E. Sha
Lei Yang
Qingfeng Zhuge
Yiyu Shi
Jiaxi Hu
57
124
0
31 Jan 2019
The Evolved Transformer
David R. So
Chen Liang
Quoc V. Le
ViT
107
462
0
30 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,770
0
11 Oct 2018
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Mengzhao Chen
Orhan Firat
Ankur Bapna
Melvin Johnson
Wolfgang Macherey
...
Niki Parmar
M. Schuster
Zhifeng Chen
Yonghui Wu
Macduff Hughes
AIMat
58
457
0
26 Apr 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,154
0
20 Apr 2018
Block-Sparse Recurrent Neural Networks
Sharan Narang
Eric Undersander
G. Diamos
44
136
0
08 Nov 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
695
131,526
0
12 Jun 2017
Neural Architecture Search with Reinforcement Learning
Barret Zoph
Quoc V. Le
459
5,372
0
05 Nov 2016
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
308
2,859
0
26 Sep 2016
Reasoning about Entailment with Neural Attention
Tim Rocktaschel
Edward Grefenstette
Karl Moritz Hermann
Tomás Kociský
Phil Blunsom
NAI
62
762
0
22 Sep 2015
CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication
Weifeng Liu
B. Vinter
70
289
0
17 Mar 2015
1