Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.03038
Cited By
v1
v2
v3 (latest)
Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit
5 December 2023
Fanfei Meng
Lele Zhang
Yu Chen
Yuxin Wang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit"
18 / 18 papers shown
Title
Text Compression-aided Transformer Encoding
Z. Li
Zhuosheng Zhang
Hai Zhao
Rui Wang
Kehai Chen
Masao Utiyama
Eiichiro Sumita
AI4CE
57
45
0
11 Feb 2021
Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation
Insoo Chung
Byeongwook Kim
Yoonjung Choi
S. Kwon
Yongkweon Jeon
Baeseong Park
Sangha Kim
Dongsoo Lee
MQ
69
27
0
16 Sep 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
81
322
0
08 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
84
360
0
05 Apr 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
255
7,547
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
109
1,869
0
23 Sep 2019
Importance Estimation for Neural Network Pruning
Pavlo Molchanov
Arun Mallya
Stephen Tyree
I. Frosio
Jan Kautz
3DPC
81
885
0
25 Jun 2019
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
Antoine Bosselut
Hannah Rashkin
Maarten Sap
Chaitanya Malaviya
Asli Celikyilmaz
Yejin Choi
82
912
0
12 Jun 2019
Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model
Aishwarya Bhandare
Vamsi Sripathi
Deepthi Karkada
Vivek V. Menon
Sun Choi
Kushal Datta
V. Saletore
MQ
74
132
0
03 Jun 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
105
1,068
0
25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
117
1,146
0
23 May 2019
Adaptive Attention Span in Transformers
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
76
286
0
19 May 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
72
421
0
28 Mar 2019
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
Jian-Hao Luo
Jianxin Wu
Weiyao Lin
58
1,761
0
20 Jul 2017
Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples
Haw-Shiuan Chang
Erik Learned-Miller
Andrew McCallum
86
354
0
24 Apr 2017
Soft Weight-Sharing for Neural Network Compression
Karen Ullrich
Edward Meeds
Max Welling
169
419
0
13 Feb 2017
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
263
8,859
0
01 Oct 2015
Learning both Weights and Connections for Efficient Neural Networks
Song Han
Jeff Pool
J. Tran
W. Dally
CVBM
313
6,700
0
08 Jun 2015
1