ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11556
  4. Cited By
Reducing Transformer Depth on Demand with Structured Dropout

Reducing Transformer Depth on Demand with Structured Dropout

25 September 2019
Angela Fan
Edouard Grave
Armand Joulin
ArXiv (abs)PDFHTML

Papers citing "Reducing Transformer Depth on Demand with Structured Dropout"

50 / 406 papers shown
Title
Generating Diverse Translation from Model Distribution with Dropout
Generating Diverse Translation from Model Distribution with Dropout
Xuanfu Wu
Yang Feng
Chenze Shao
38
13
0
16 Oct 2020
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime
  with Search
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search
Gyuwan Kim
Kyunghyun Cho
94
98
0
14 Oct 2020
Weight Squeezing: Reparameterization for Knowledge Transfer and Model
  Compression
Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression
Artem Chumachenko
Daniil Gavrilov
Nikita Balagansky
Pavel Kalaidin
56
1
0
14 Oct 2020
Adversarial Self-Supervised Data-Free Distillation for Text
  Classification
Adversarial Self-Supervised Data-Free Distillation for Text Classification
Xinyin Ma
Yongliang Shen
Gongfan Fang
Chen Chen
Chenghao Jia
Weiming Lu
124
24
0
10 Oct 2020
Deep Learning Meets Projective Clustering
Deep Learning Meets Projective Clustering
Alaa Maalouf
Harry Lang
Daniela Rus
Dan Feldman
113
9
0
08 Oct 2020
Population Based Training for Data Augmentation and Regularization in
  Speech Recognition
Population Based Training for Data Augmentation and Regularization in Speech Recognition
Daniel Haziza
Jérémy Rapin
Gabriel Synnaeve
35
1
0
08 Oct 2020
On the importance of pre-training data volume for compact language
  models
On the importance of pre-training data volume for compact language models
Vincent Micheli
Martin d'Hoffschmidt
Franccois Fleuret
67
42
0
08 Oct 2020
AxFormer: Accuracy-driven Approximation of Transformers for Faster,
  Smaller and more Accurate NLP Models
AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models
Amrit Nagarajan
Sanchari Sen
Jacob R. Stevens
A. Raghunathan
22
3
0
07 Oct 2020
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized
  Identity Prior
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior
Zi Lin
Jeremiah Zhe Liu
Ziao Yang
Nan Hua
Dan Roth
94
47
0
05 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders
Which *BERT? A Survey Organizing Contextualized Encoders
Patrick Xia
Shijie Wu
Benjamin Van Durme
62
50
0
02 Oct 2020
AUBER: Automated BERT Regularization
AUBER: Automated BERT Regularization
Hyun Dong Lee
Seongmin Lee
U. Kang
38
9
0
30 Sep 2020
Deep Transformers with Latent Depth
Deep Transformers with Latent Depth
Xian Li
Asa Cooper Stickland
Yuqing Tang
X. Kong
71
23
0
28 Sep 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
101
211
0
27 Sep 2020
Alleviating the Inequality of Attention Heads for Neural Machine
  Translation
Alleviating the Inequality of Attention Heads for Neural Machine Translation
Zewei Sun
Shujian Huang
Xinyu Dai
Jiajun Chen
63
7
0
21 Sep 2020
Dissecting Lottery Ticket Transformers: Structural and Behavioral Study
  of Sparse Neural Machine Translation
Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation
Rajiv Movva
Jason Zhao
68
12
0
17 Sep 2020
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank
  Approximation
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation
M. Tukan
Alaa Maalouf
Matan Weksler
Dan Feldman
77
9
0
11 Sep 2020
Compression of Deep Learning Models for Text: A Survey
Compression of Deep Learning Models for Text: A Survey
Manish Gupta
Puneet Agrawal
VLMMedImAI4CE
79
119
0
12 Aug 2020
ConvBERT: Improving BERT with Span-based Dynamic Convolution
ConvBERT: Improving BERT with Span-based Dynamic Convolution
Zihang Jiang
Weihao Yu
Daquan Zhou
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
135
162
0
06 Aug 2020
Compressing Deep Neural Networks via Layer Fusion
Compressing Deep Neural Networks via Layer Fusion
James OÑeill
Greg Ver Steeg
Aram Galstyan
AI4CE
29
5
0
29 Jul 2020
Contrastive Visual-Linguistic Pretraining
Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Su
Zhengkai Jiang
Peng Gao
Zuohui Fu
Gerard de Melo
Sen Su
VLMSSLCLIP
105
29
0
26 Jul 2020
Diverse Ensembles Improve Calibration
Diverse Ensembles Improve Calibration
Asa Cooper Stickland
Iain Murray
UQCVFedML
78
28
0
08 Jul 2020
PyTorch Distributed: Experiences on Accelerating Data Parallel Training
PyTorch Distributed: Experiences on Accelerating Data Parallel Training
Shen Li
Yanli Zhao
R. Varma
Omkar Salpekar
P. Noordhuis
...
Adam Paszke
Jeff Smith
Brian Vaughan
Pritam Damania
Soumith Chintala
OODMoE
80
190
0
28 Jun 2020
Open-Domain Conversational Agents: Current Progress, Open Problems, and
  Future Directions
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions
Stephen Roller
Y-Lan Boureau
Jason Weston
Antoine Bordes
Emily Dinan
...
Kurt Shuster
Eric Michael Smith
Arthur Szlam
Jack Urbanek
Mary Williamson
LLMAGAI4CE
132
52
0
22 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
323
5,868
0
20 Jun 2020
Multi-branch Attentive Transformer
Multi-branch Attentive Transformer
Yang Fan
Shufang Xie
Yingce Xia
Lijun Wu
Tao Qin
Xiang-Yang Li
Tie-Yan Liu
65
17
0
18 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
79
343
0
07 Jun 2020
Normalized Attention Without Probability Cage
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
91
21
0
19 May 2020
Iterative Pseudo-Labeling for Speech Recognition
Iterative Pseudo-Labeling for Speech Recognition
Qiantong Xu
Tatiana Likhomanenko
Jacob Kahn
Awni Y. Hannun
Gabriel Synnaeve
R. Collobert
VLM
94
134
0
19 May 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
105
489
0
15 May 2020
Adaptive Transformers for Learning Multimodal Representations
Adaptive Transformers for Learning Multimodal Representations
Prajjwal Bhargava
21
4
0
15 May 2020
A Mixture of $h-1$ Heads is Better than $h$ Heads
A Mixture of h−1h-1h−1 Heads is Better than hhh Heads
Hao Peng
Roy Schwartz
Dianqi Li
Noah A. Smith
MoE
74
33
0
13 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy
  Efficient Inference
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
80
190
0
08 May 2020
MUSS: Multilingual Unsupervised Sentence Simplification by Mining
  Paraphrases
MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases
Louis Martin
Angela Fan
Eric Villemonte de la Clergerie
Antoine Bordes
Benoît Sagot
76
36
0
01 May 2020
Scheduled DropHead: A Regularization Method for Transformer Models
Scheduled DropHead: A Regularization Method for Transformer Models
Wangchunshu Zhou
Tao Ge
Ke Xu
Furu Wei
Ming Zhou
62
36
0
28 Apr 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
Ji Xin
Raphael Tang
Jaejun Lee
Yaoliang Yu
Jimmy J. Lin
65
377
0
27 Apr 2020
Faster Depth-Adaptive Transformers
Faster Depth-Adaptive Transformers
Yijin Liu
Fandong Meng
Jie Zhou
Jinan Xu
Jinan Xu
44
2
0
27 Apr 2020
The Right Tool for the Job: Matching Model and Instance Complexities
The Right Tool for the Job: Matching Model and Instance Complexities
Roy Schwartz
Gabriel Stanovsky
Swabha Swayamdipta
Jesse Dodge
Noah A. Smith
153
170
0
16 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
111
246
0
15 Apr 2020
On Optimal Transformer Depth for Low-Resource Language Translation
On Optimal Transformer Depth for Low-Resource Language Translation
Elan Van Biljon
Arnu Pretorius
Julia Kreutzer
MoE
62
27
0
09 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
91
323
0
08 Apr 2020
On the Effect of Dropping Layers of Pre-trained Transformer Models
On the Effect of Dropping Layers of Pre-trained Transformer Models
Hassan Sajjad
Fahim Dalvi
Nadir Durrani
Preslav Nakov
83
143
0
08 Apr 2020
PowerNorm: Rethinking Batch Normalization in Transformers
PowerNorm: Rethinking Batch Normalization in Transformers
Sheng Shen
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
BDL
116
16
0
17 Mar 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
146
1,511
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
138
201
0
27 Feb 2020
Addressing Some Limitations of Transformers with Feedback Memory
Addressing Some Limitations of Transformers with Feedback Memory
Angela Fan
Thibaut Lavril
Edouard Grave
Armand Joulin
Sainbayar Sukhbaatar
61
11
0
21 Feb 2020
Controlling Computation versus Quality for Neural Sequence Models
Controlling Computation versus Quality for Neural Sequence Models
Ankur Bapna
N. Arivazhagan
Orhan Firat
85
30
0
17 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
348
201
0
07 Feb 2020
Scaling Up Online Speech Recognition Using ConvNets
Scaling Up Online Speech Recognition Using ConvNets
Vineel Pratap
Qiantong Xu
Jacob Kahn
Gilad Avidov
Tatiana Likhomanenko
Awni Y. Hannun
Vitaliy Liptchinsky
Gabriel Synnaeve
R. Collobert
242
39
0
27 Jan 2020
BERT's output layer recognizes all hidden layers? Some Intriguing
  Phenomena and a simple way to boost BERT
BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT
Wei-Tsung Kao
Tsung-Han Wu
Po-Han Chi
Chun-Cheng Hsieh
Hung-yi Lee
SSL
44
5
0
25 Jan 2020
FlauBERT: Unsupervised Language Model Pre-training for French
FlauBERT: Unsupervised Language Model Pre-training for French
Hang Le
Loïc Vial
Jibril Frej
Vincent Segonne
Maximin Coavoux
Benjamin Lecouteux
A. Allauzen
Benoît Crabbé
Laurent Besacier
D. Schwab
AI4CE
111
401
0
11 Dec 2019
Previous
123456789
Next