Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.11556
Cited By
Reducing Transformer Depth on Demand with Structured Dropout
25 September 2019
Angela Fan
Edouard Grave
Armand Joulin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reducing Transformer Depth on Demand with Structured Dropout"
50 / 400 papers shown
Title
Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm
Aidan Good
Jia-Huei Lin
Hannah Sieg
Mikey Ferguson
Xin Yu
Shandian Zhe
J. Wieczorek
Thiago Serra
37
11
0
07 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
50
442
0
04 Jun 2022
Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Xiaoxia Wu
Z. Yao
Minjia Zhang
Conglong Li
Yuxiong He
MQ
19
31
0
04 Jun 2022
Improving the Robustness and Generalization of Deep Neural Network with Confidence Threshold Reduction
Xiangyuan Yang
Jie Lin
Hanlin Zhang
Xinyu Yang
Peng Zhao
AAML
OOD
19
1
0
02 Jun 2022
MiniDisc: Minimal Distillation Schedule for Language Model Compression
Chen Zhang
Yang Yang
Qifan Wang
Jiahao Liu
Jingang Wang
Wei Yu Wu
Dawei Song
47
4
0
29 May 2022
Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
R. Liu
Young Jin Kim
Alexandre Muzio
Hany Awadalla
MoE
47
22
0
28 May 2022
HyperTree Proof Search for Neural Theorem Proving
Guillaume Lample
Marie-Anne Lachaux
Thibaut Lavril
Xavier Martinet
Amaury Hayat
Gabriel Ebner
Aurelien Rodriguez
Timothée Lacroix
AIMat
28
134
0
23 May 2022
Task-specific Compression for Multi-task Language Models using Attribution-based Pruning
Nakyeong Yang
Yunah Jang
Hwanhee Lee
Seohyeong Jung
Kyomin Jung
16
8
0
09 May 2022
Adaptable Adapters
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
48
21
0
03 May 2022
On-demand compute reduction with stochastic wav2vec 2.0
Apoorv Vyas
Wei-Ning Hsu
Michael Auli
Alexei Baevski
24
13
0
25 Apr 2022
A Model-Agnostic Data Manipulation Method for Persona-based Dialogue Generation
Yu Cao
Wei Bi
Meng Fang
Shuming Shi
Dacheng Tao
29
48
0
21 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
14
43
0
16 Apr 2022
HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition
J. Yoon
Beom Jun Woo
N. Kim
27
13
0
13 Apr 2022
Generating Full Length Wikipedia Biographies: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies
Angela Fan
Claire Gardent
22
4
0
12 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
Multimodal Transformer for Nursing Activity Recognition
Momal Ijaz
Renato Diaz
Cheng Chen
ViT
22
26
0
09 Apr 2022
Speech Pre-training with Acoustic Piece
Shuo Ren
Shujie Liu
Yu Wu
Long Zhou
Furu Wei
SSL
14
16
0
07 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
9
177
0
01 Apr 2022
A Fast Post-Training Pruning Framework for Transformers
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
29
144
0
29 Mar 2022
Training speaker recognition systems with limited data
Nik Vaessen
David A. van Leeuwen
11
6
0
28 Mar 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
19
14
0
27 Mar 2022
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering
Zhou Yu
Zitian Jin
Jun Yu
Mingliang Xu
Hongbo Wang
Jianping Fan
33
4
0
24 Mar 2022
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
20
5
0
23 Mar 2022
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation
Bei Li
Quan Du
Tao Zhou
Yi Jing
Shuhan Zhou
Xin Zeng
Tong Xiao
JingBo Zhu
Xuebo Liu
Min Zhang
22
31
0
17 Mar 2022
Unified Visual Transformer Compression
Shixing Yu
Tianlong Chen
Jiayi Shen
Huan Yuan
Jianchao Tan
Sen Yang
Ji Liu
Zhangyang Wang
ViT
19
92
0
15 Mar 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Eldar Kurtic
Daniel Fernando Campos
Tuan Nguyen
Elias Frantar
Mark Kurtz
Ben Fineran
Michael Goin
Dan Alistarh
VLM
MQ
MedIm
22
120
0
14 Mar 2022
Filter-enhanced MLP is All You Need for Sequential Recommendation
Kun Zhou
Hui Yu
Wayne Xin Zhao
Ji-Rong Wen
85
253
0
28 Feb 2022
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
82
213
0
18 Feb 2022
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Tao Ge
Si-Qing Chen
Furu Wei
MoE
26
21
0
16 Feb 2022
A Survey on Model Compression and Acceleration for Pretrained Language Models
Canwen Xu
Julian McAuley
23
58
0
15 Feb 2022
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Chen Liang
Haoming Jiang
Simiao Zuo
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
T. Zhao
17
14
0
06 Feb 2022
Star Temporal Classification: Sequence Classification with Partially Labeled Data
Vineel Pratap
Awni Y. Hannun
Gabriel Synnaeve
R. Collobert
17
8
0
28 Jan 2022
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
Wenyong Huang
Zhenhe Zhang
Y. Yeung
Xin Jiang
Qun Liu
33
23
0
25 Jan 2022
Weight Expansion: A New Perspective on Dropout and Generalization
Gao Jin
Xinping Yi
Pengfei Yang
Lijun Zhang
S. Schewe
Xiaowei Huang
29
5
0
23 Jan 2022
Can Model Compression Improve NLP Fairness
Guangxuan Xu
Qingyuan Hu
28
26
0
21 Jan 2022
Pretrained Language Models for Text Generation: A Survey
Junyi Li
Tianyi Tang
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
AI4CE
36
127
0
14 Jan 2022
Latency Adjustable Transformer Encoder for Language Understanding
Sajjad Kachuee
M. Sharifkhani
29
0
0
10 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
35
305
0
05 Jan 2022
Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models
Zhengzhe Yu
Jiaxin Guo
Minghan Wang
Daimeng Wei
Hengchao Shang
...
Chang Su
M. Zhang
Lizhi Lei
Shimin Tao
Hao Yang
6
3
0
22 Dec 2021
From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression
Runxin Xu
Fuli Luo
Chengyu Wang
Baobao Chang
Jun Huang
Songfang Huang
Fei Huang
VLM
27
25
0
14 Dec 2021
On the Compression of Natural Language Models
S. Damadi
22
0
0
13 Dec 2021
KPDrop: Improving Absent Keyphrase Generation
Jishnu Ray Chowdhury
Seoyeon Park
Tuhin Kundu
Cornelia Caragea
27
7
0
02 Dec 2021
A Unified Pruning Framework for Vision Transformers
Hao Yu
Jianxin Wu
ViT
28
59
0
30 Nov 2021
Can depth-adaptive BERT perform better on binary classification tasks
Jing Fan
Xin Zhang
Sheng Zhang
Yan Pan
Lixiang Guo
MQ
12
0
0
22 Nov 2021
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu
Changhan Wang
Andros Tjandra
Kushal Lakhotia
Qiantong Xu
...
Yatharth Saraf
J. Pino
Alexei Baevski
Alexis Conneau
Michael Auli
SSL
32
657
0
17 Nov 2021
A Survey on Green Deep Learning
Jingjing Xu
Wangchunshu Zhou
Zhiyi Fu
Hao Zhou
Lei Li
VLM
73
83
0
08 Nov 2021
Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning
Xuanli He
I. Keivanloo
Yi Xu
Xiang He
Belinda Zeng
Santosh Rajagopalan
Trishul Chilimbi
10
18
0
30 Oct 2021
Pruning Attention Heads of Transformer Models Using A* Search: A Novel Approach to Compress Big NLP Architectures
Archit Parnami
Rahul Singh
Tarun Joshi
18
5
0
28 Oct 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
104
1,704
0
26 Oct 2021
When in Doubt, Summon the Titans: Efficient Inference with Large Models
A. S. Rawat
Manzil Zaheer
A. Menon
Amr Ahmed
Sanjiv Kumar
17
7
0
19 Oct 2021
Previous
1
2
3
4
5
6
7
8
Next