Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.11556
Cited By
Reducing Transformer Depth on Demand with Structured Dropout
25 September 2019
Angela Fan
Edouard Grave
Armand Joulin
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Reducing Transformer Depth on Demand with Structured Dropout"
50 / 406 papers shown
Title
FocusFormer: Focusing on What We Need via Architecture Sampler
Jing Liu
Jianfei Cai
Bohan Zhuang
65
8
0
23 Aug 2022
Efficient model compression with Random Operation Access Specific Tile (ROAST) hashing
Aditya Desai
K. Zhou
Anshumali Shrivastava
43
1
0
21 Jul 2022
Confident Adaptive Language Modeling
Tal Schuster
Adam Fisch
Jai Gupta
Mostafa Dehghani
Dara Bahri
Vinh Q. Tran
Yi Tay
Donald Metzler
153
170
0
14 Jul 2022
STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining
Liwei Guo
Wonkyo Choe
F. Lin
67
15
0
11 Jul 2022
Adversarial Self-Attention for Language Understanding
Hongqiu Wu
Ruixue Ding
Hai Zhao
Pengjun Xie
Fei Huang
Min Zhang
81
12
0
25 Jun 2022
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Qingru Zhang
Simiao Zuo
Chen Liang
Alexander Bukharin
Pengcheng He
Weizhu Chen
T. Zhao
81
80
0
25 Jun 2022
Binary Early-Exit Network for Adaptive Inference on Low-Resource Devices
Aaqib Saeed
MQ
23
1
0
17 Jun 2022
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
99
246
0
13 Jun 2022
Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm
Aidan Good
Jia-Huei Lin
Hannah Sieg
Mikey Ferguson
Xin Yu
Shandian Zhe
J. Wieczorek
Thiago Serra
104
11
0
07 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
174
484
0
04 Jun 2022
Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Xiaoxia Wu
Z. Yao
Minjia Zhang
Conglong Li
Yuxiong He
MQ
68
31
0
04 Jun 2022
Improving the Robustness and Generalization of Deep Neural Network with Confidence Threshold Reduction
Xiangyuan Yang
Jie Lin
Hanlin Zhang
Xinyu Yang
Peng Zhao
AAML
OOD
65
1
0
02 Jun 2022
MiniDisc: Minimal Distillation Schedule for Language Model Compression
Chen Zhang
Yang Yang
Qifan Wang
Jiahao Liu
Jingang Wang
Wei Wu
Dawei Song
79
4
0
29 May 2022
Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
R. Liu
Young Jin Kim
Alexandre Muzio
Hany Awadalla
MoE
81
22
0
28 May 2022
HyperTree Proof Search for Neural Theorem Proving
Guillaume Lample
Marie-Anne Lachaux
Thibaut Lavril
Xavier Martinet
Amaury Hayat
Gabriel Ebner
Aurelien Rodriguez
Timothée Lacroix
AIMat
96
151
0
23 May 2022
Task-specific Compression for Multi-task Language Models using Attribution-based Pruning
Nakyeong Yang
Yunah Jang
Hwanhee Lee
Seohyeong Jung
Kyomin Jung
28
9
0
09 May 2022
Adaptable Adapters
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
99
21
0
03 May 2022
On-demand compute reduction with stochastic wav2vec 2.0
Apoorv Vyas
Wei-Ning Hsu
Michael Auli
Alexei Baevski
66
13
0
25 Apr 2022
A Model-Agnostic Data Manipulation Method for Persona-based Dialogue Generation
Yu Cao
Wei Bi
Meng Fang
Shuming Shi
Dacheng Tao
65
50
0
21 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
64
47
0
16 Apr 2022
HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition
J. Yoon
Beom Jun Woo
N. Kim
66
13
0
13 Apr 2022
Generating Full Length Wikipedia Biographies: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies
Angela Fan
Claire Gardent
36
5
0
12 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
97
9
0
11 Apr 2022
Multimodal Transformer for Nursing Activity Recognition
Momal Ijaz
Renato Diaz
Chong Chen
ViT
101
27
0
09 Apr 2022
Speech Pre-training with Acoustic Piece
Shuo Ren
Shujie Liu
Yu Wu
Long Zhou
Furu Wei
SSL
65
17
0
07 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
115
189
0
01 Apr 2022
A Fast Post-Training Pruning Framework for Transformers
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
113
157
0
29 Mar 2022
Training speaker recognition systems with limited data
Nik Vaessen
David A. van Leeuwen
45
6
0
28 Mar 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
51
16
0
27 Mar 2022
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering
Zhou Yu
Zitian Jin
Jun Yu
Mingliang Xu
Hongbo Wang
Jianping Fan
60
4
0
24 Mar 2022
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
120
6
0
23 Mar 2022
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation
Bei Li
Quan Du
Tao Zhou
Yi Jing
Shuhan Zhou
Xin Zeng
Tong Xiao
JingBo Zhu
Xuebo Liu
Min Zhang
57
35
0
17 Mar 2022
Unified Visual Transformer Compression
Shixing Yu
Tianlong Chen
Jiayi Shen
Huan Yuan
Jianchao Tan
Sen Yang
Ji Liu
Zhangyang Wang
ViT
94
94
0
15 Mar 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Eldar Kurtic
Daniel Fernando Campos
Tuan Nguyen
Elias Frantar
Mark Kurtz
Ben Fineran
Michael Goin
Dan Alistarh
VLM
MQ
MedIm
120
127
0
14 Mar 2022
Filter-enhanced MLP is All You Need for Sequential Recommendation
Kun Zhou
Hui Yu
Wayne Xin Zhao
Ji-Rong Wen
153
274
0
28 Feb 2022
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
183
227
0
18 Feb 2022
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Tao Ge
Si-Qing Chen
Furu Wei
MoE
91
23
0
16 Feb 2022
A Survey on Model Compression and Acceleration for Pretrained Language Models
Canwen Xu
Julian McAuley
108
61
0
15 Feb 2022
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Chen Liang
Haoming Jiang
Simiao Zuo
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
T. Zhao
72
14
0
06 Feb 2022
Star Temporal Classification: Sequence Classification with Partially Labeled Data
Vineel Pratap
Awni Y. Hannun
Gabriel Synnaeve
R. Collobert
56
8
0
28 Jan 2022
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
Wenyong Huang
Zhenhe Zhang
Y. Yeung
Xin Jiang
Qun Liu
111
23
0
25 Jan 2022
Weight Expansion: A New Perspective on Dropout and Generalization
Gao Jin
Xinping Yi
Pengfei Yang
Lijun Zhang
S. Schewe
Xiaowei Huang
92
5
0
23 Jan 2022
Can Model Compression Improve NLP Fairness
Guangxuan Xu
Qingyuan Hu
61
28
0
21 Jan 2022
Pretrained Language Models for Text Generation: A Survey
Junyi Li
Tianyi Tang
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
AI4CE
175
151
0
14 Jan 2022
Latency Adjustable Transformer Encoder for Language Understanding
Sajjad Kachuee
M. Sharifkhani
88
0
0
10 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
130
321
0
05 Jan 2022
Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models
Zhengzhe Yu
Jiaxin Guo
Minghan Wang
Daimeng Wei
Hengchao Shang
...
Chang Su
Hao Fei
Lizhi Lei
Shimin Tao
Hao Yang
34
3
0
22 Dec 2021
From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression
Runxin Xu
Fuli Luo
Chengyu Wang
Baobao Chang
Jun Huang
Songfang Huang
Fei Huang
VLM
64
26
0
14 Dec 2021
On the Compression of Natural Language Models
S. Damadi
34
0
0
13 Dec 2021
KPDrop: Improving Absent Keyphrase Generation
Jishnu Ray Chowdhury
Seoyeon Park
Tuhin Kundu
Cornelia Caragea
95
7
0
02 Dec 2021
Previous
1
2
3
4
5
6
7
8
9
Next