ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11556
  4. Cited By
Reducing Transformer Depth on Demand with Structured Dropout

Reducing Transformer Depth on Demand with Structured Dropout

25 September 2019
Angela Fan
Edouard Grave
Armand Joulin
ArXivPDFHTML

Papers citing "Reducing Transformer Depth on Demand with Structured Dropout"

50 / 400 papers shown
Title
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy
  For Latency
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Yangyang Shi
Varun K. Nagaraja
Chunyang Wu
Jay Mahadeokar
Duc Le
...
Ching-Feng Yeh
Julian Chan
Christian Fuegen
Ozlem Kalinli
M. Seltzer
27
15
0
05 Apr 2021
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised
  Pre-Training
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
Wei-Ning Hsu
Anuroop Sriram
Alexei Baevski
Tatiana Likhomanenko
Qiantong Xu
...
Jacob Kahn
Ann Lee
R. Collobert
Gabriel Synnaeve
Michael Auli
SSL
16
236
0
02 Apr 2021
Going deeper with Image Transformers
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
27
986
0
31 Mar 2021
Finetuning Pretrained Transformers into RNNs
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
36
63
0
24 Mar 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning
  Architectures
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
60
92
0
23 Mar 2021
IOT: Instance-wise Layer Reordering for Transformer Structures
IOT: Instance-wise Layer Reordering for Transformer Structures
Jinhua Zhu
Lijun Wu
Yingce Xia
Shufang Xie
Tao Qin
Wen-gang Zhou
Houqiang Li
Tie-Yan Liu
29
7
0
05 Mar 2021
Memory-efficient Speech Recognition on Smart Devices
Memory-efficient Speech Recognition on Smart Devices
Ganesh Venkatesh
Alagappan Valliappan
Jay Mahadeokar
Shangguan Yuan
Christian Fuegen
M. Seltzer
Vikas Chandra
17
11
0
23 Feb 2021
End-to-End Neural Systems for Automatic Children Speech Recognition: An
  Empirical Study
End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
Prashanth Gurunath Shivakumar
Shrikanth Narayanan
14
48
0
19 Feb 2021
Learning Dynamic BERT via Trainable Gate Variables and a Bi-modal
  Regularizer
Learning Dynamic BERT via Trainable Gate Variables and a Bi-modal Regularizer
Seohyeong Jeong
Nojun Kwak
11
0
0
19 Feb 2021
TransReID: Transformer-based Object Re-Identification
TransReID: Transformer-based Object Re-Identification
Shuting He
Haowen Luo
Pichao Wang
F. Wang
Hao Li
Wei Jiang
ViT
215
794
0
08 Feb 2021
AutoFreeze: Automatically Freezing Model Blocks to Accelerate
  Fine-tuning
AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning
Yuhan Liu
Saurabh Agarwal
Shivaram Venkataraman
OffRL
11
53
0
02 Feb 2021
BENDR: using transformers and a contrastive self-supervised learning
  task to learn from massive amounts of EEG data
BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data
Demetres Kostas
Stephane Aroca-Ouellette
Frank Rudzicz
SSL
41
202
0
28 Jan 2021
Model Compression for Domain Adaptation through Causal Effect Estimation
Model Compression for Domain Adaptation through Causal Effect Estimation
Guy Rotman
Amir Feder
Roi Reichart
CML
6
7
0
18 Jan 2021
KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with
  Learned Step Size Quantization
KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization
Jing Jin
Cai Liang
Tiancheng Wu
Li Zou
Zhiliang Gan
MQ
17
26
0
15 Jan 2021
I-BERT: Integer-only BERT Quantization
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
105
341
0
05 Jan 2021
An Efficient Transformer Decoder with Compressed Sub-layers
An Efficient Transformer Decoder with Compressed Sub-layers
Yanyang Li
Ye Lin
Tong Xiao
Jingbo Zhu
33
29
0
03 Jan 2021
Subformer: Exploring Weight Sharing for Parameter Efficiency in
  Generative Transformers
Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers
Machel Reid
Edison Marrese-Taylor
Y. Matsuo
MoE
14
48
0
01 Jan 2021
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
Xiaohan Chen
Yu Cheng
Shuohang Wang
Zhe Gan
Zhangyang Wang
Jingjing Liu
38
99
0
31 Dec 2020
BinaryBERT: Pushing the Limit of BERT Quantization
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
142
221
0
31 Dec 2020
Reservoir Transformers
Reservoir Transformers
Sheng Shen
Alexei Baevski
Ari S. Morcos
Kurt Keutzer
Michael Auli
Douwe Kiela
35
17
0
30 Dec 2020
CascadeBERT: Accelerating Inference of Pre-trained Language Models via
  Calibrated Complete Models Cascade
CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade
Lei Li
Yankai Lin
Deli Chen
Shuhuai Ren
Peng Li
Jie Zhou
Xu Sun
29
51
0
29 Dec 2020
Learning Light-Weight Translation Models from Deep Transformer
Learning Light-Weight Translation Models from Deep Transformer
Bei Li
Ziyang Wang
Hui Liu
Quan Du
Tong Xiao
Chunliang Zhang
Jingbo Zhu
VLM
120
40
0
27 Dec 2020
Training data-efficient image transformers & distillation through
  attention
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
146
6,561
0
23 Dec 2020
A Survey on Visual Transformer
A Survey on Visual Transformer
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
18
2,130
0
23 Dec 2020
Trex: Learning Execution Semantics from Micro-Traces for Binary
  Similarity
Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity
Kexin Pei
Zhou Xuan
Junfeng Yang
Suman Jana
Baishakhi Ray
19
88
0
16 Dec 2020
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Xiaoqi Jiao
Huating Chang
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
Fang Wang
Qun Liu
29
12
0
11 Dec 2020
MLS: A Large-Scale Multilingual Dataset for Speech Research
MLS: A Large-Scale Multilingual Dataset for Speech Research
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
32
468
0
07 Dec 2020
Language Models not just for Pre-training: Fast Online Neural Noisy
  Channel Modeling
Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling
Shruti Bhosale
Kyra Yee
Sergey Edunov
Michael Auli
50
7
0
13 Nov 2020
Multilingual AMR-to-Text Generation
Multilingual AMR-to-Text Generation
Angela Fan
Claire Gardent
4
32
0
10 Nov 2020
Don't Read Too Much into It: Adaptive Computation for Open-Domain
  Question Answering
Don't Read Too Much into It: Adaptive Computation for Open-Domain Question Answering
Yuxiang Wu
Sebastian Riedel
Pasquale Minervini
Pontus Stenetorp
30
8
0
10 Nov 2020
Stochastic Attention Head Removal: A simple and effective method for
  improving Transformer Based ASR Models
Stochastic Attention Head Removal: A simple and effective method for improving Transformer Based ASR Models
Shucong Zhang
Erfan Loweimi
P. Bell
Steve Renals
12
0
0
08 Nov 2020
Rethinking the Value of Transformer Components
Rethinking the Value of Transformer Components
Wenxuan Wang
Zhaopeng Tu
6
38
0
07 Nov 2020
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Zhengyan Zhang
Fanchao Qi
Zhiyuan Liu
Qun Liu
Maosong Sun
VLM
36
30
0
07 Nov 2020
Optimizing Transformer for Low-Resource Neural Machine Translation
Optimizing Transformer for Low-Resource Neural Machine Translation
Ali Araabi
Christof Monz
VLM
33
78
0
04 Nov 2020
Joint Masked CPC and CTC Training for ASR
Joint Masked CPC and CTC Training for ASR
Chaitanya Talnikar
Tatiana Likhomanenko
R. Collobert
Gabriel Synnaeve
SSL
14
27
0
30 Oct 2020
Accelerating Training of Transformer-Based Language Models with
  Progressive Layer Dropping
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Minjia Zhang
Yuxiong He
AI4CE
8
100
0
26 Oct 2020
Pre-trained Summarization Distillation
Pre-trained Summarization Distillation
Sam Shleifer
Alexander M. Rush
23
98
0
24 Oct 2020
AdapterDrop: On the Efficiency of Adapters in Transformers
AdapterDrop: On the Efficiency of Adapters in Transformers
Andreas Rucklé
Gregor Geigle
Max Glockner
Tilman Beck
Jonas Pfeiffer
Nils Reimers
Iryna Gurevych
46
254
0
22 Oct 2020
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Tatiana Likhomanenko
Qiantong Xu
Vineel Pratap
Paden Tomasello
Jacob Kahn
Gilad Avidov
R. Collobert
Gabriel Synnaeve
39
98
0
22 Oct 2020
SlimIPL: Language-Model-Free Iterative Pseudo-Labeling
SlimIPL: Language-Model-Free Iterative Pseudo-Labeling
Tatiana Likhomanenko
Qiantong Xu
Jacob Kahn
Gabriel Synnaeve
R. Collobert
VLM
23
61
0
22 Oct 2020
Beyond English-Centric Multilingual Machine Translation
Beyond English-Centric Multilingual Machine Translation
Angela Fan
Shruti Bhosale
Holger Schwenk
Zhiyi Ma
Ahmed El-Kishky
...
Vitaliy Liptchinsky
Sergey Edunov
Edouard Grave
Michael Auli
Armand Joulin
LRM
29
824
0
21 Oct 2020
Training Flexible Depth Model by Multi-Task Learning for Neural Machine
  Translation
Training Flexible Depth Model by Multi-Task Learning for Neural Machine Translation
Qiang Wang
Tong Xiao
Jingbo Zhu
6
2
0
16 Oct 2020
Generating Diverse Translation from Model Distribution with Dropout
Generating Diverse Translation from Model Distribution with Dropout
Xuanfu Wu
Yang Feng
Chenze Shao
17
13
0
16 Oct 2020
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime
  with Search
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search
Gyuwan Kim
Kyunghyun Cho
34
92
0
14 Oct 2020
Weight Squeezing: Reparameterization for Knowledge Transfer and Model
  Compression
Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression
Artem Chumachenko
Daniil Gavrilov
Nikita Balagansky
Pavel Kalaidin
8
0
0
14 Oct 2020
Adversarial Self-Supervised Data-Free Distillation for Text
  Classification
Adversarial Self-Supervised Data-Free Distillation for Text Classification
Xinyin Ma
Yongliang Shen
Gongfan Fang
Chen Chen
Chenghao Jia
Weiming Lu
25
24
0
10 Oct 2020
Deep Learning Meets Projective Clustering
Deep Learning Meets Projective Clustering
Alaa Maalouf
Harry Lang
Daniela Rus
Dan Feldman
16
9
0
08 Oct 2020
Population Based Training for Data Augmentation and Regularization in
  Speech Recognition
Population Based Training for Data Augmentation and Regularization in Speech Recognition
Daniel Haziza
Jérémy Rapin
Gabriel Synnaeve
25
1
0
08 Oct 2020
On the importance of pre-training data volume for compact language
  models
On the importance of pre-training data volume for compact language models
Vincent Micheli
Martin d'Hoffschmidt
Franccois Fleuret
12
41
0
08 Oct 2020
AxFormer: Accuracy-driven Approximation of Transformers for Faster,
  Smaller and more Accurate NLP Models
AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models
Amrit Nagarajan
Sanchari Sen
Jacob R. Stevens
A. Raghunathan
8
3
0
07 Oct 2020
Previous
12345678
Next