ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.03038
  4. Cited By
v1v2v3 (latest)

Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit

5 December 2023
Fanfei Meng
Lele Zhang
Yu Chen
Yuxin Wang
ArXiv (abs)PDFHTML

Papers citing "Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit"

18 / 18 papers shown
Title
Text Compression-aided Transformer Encoding
Text Compression-aided Transformer Encoding
Z. Li
Zhuosheng Zhang
Hai Zhao
Rui Wang
Kehai Chen
Masao Utiyama
Eiichiro Sumita
AI4CE
57
45
0
11 Feb 2021
Extremely Low Bit Transformer Quantization for On-Device Neural Machine
  Translation
Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation
Insoo Chung
Byeongwook Kim
Yoonjung Choi
S. Kwon
Yongkweon Jeon
Baeseong Park
Sangha Kim
Dongsoo Lee
MQ
69
27
0
16 Sep 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
79
322
0
08 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
84
360
0
05 Apr 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
255
7,547
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
109
1,869
0
23 Sep 2019
Importance Estimation for Neural Network Pruning
Importance Estimation for Neural Network Pruning
Pavlo Molchanov
Arun Mallya
Stephen Tyree
I. Frosio
Jan Kautz
3DPC
81
885
0
25 Jun 2019
COMET: Commonsense Transformers for Automatic Knowledge Graph
  Construction
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
Antoine Bosselut
Hannah Rashkin
Maarten Sap
Chaitanya Malaviya
Asli Celikyilmaz
Yejin Choi
82
912
0
12 Jun 2019
Efficient 8-Bit Quantization of Transformer Neural Machine Language
  Translation Model
Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model
Aishwarya Bhandare
Vamsi Sripathi
Deepthi Karkada
Vivek V. Menon
Sun Choi
Kushal Datta
V. Saletore
MQ
74
132
0
03 Jun 2019
Are Sixteen Heads Really Better than One?
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
105
1,068
0
25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
117
1,146
0
23 May 2019
Adaptive Attention Span in Transformers
Adaptive Attention Span in Transformers
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
76
286
0
19 May 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
72
421
0
28 Mar 2019
ThiNet: A Filter Level Pruning Method for Deep Neural Network
  Compression
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
Jian-Hao Luo
Jianxin Wu
Weiyao Lin
58
1,761
0
20 Jul 2017
Active Bias: Training More Accurate Neural Networks by Emphasizing High
  Variance Samples
Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples
Haw-Shiuan Chang
Erik Learned-Miller
Andrew McCallum
86
354
0
24 Apr 2017
Soft Weight-Sharing for Neural Network Compression
Soft Weight-Sharing for Neural Network Compression
Karen Ullrich
Edward Meeds
Max Welling
169
419
0
13 Feb 2017
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
263
8,859
0
01 Oct 2015
Learning both Weights and Connections for Efficient Neural Networks
Learning both Weights and Connections for Efficient Neural Networks
Song Han
Jeff Pool
J. Tran
W. Dally
CVBM
313
6,700
0
08 Jun 2015
1