ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11556
  4. Cited By
Reducing Transformer Depth on Demand with Structured Dropout

Reducing Transformer Depth on Demand with Structured Dropout

25 September 2019
Angela Fan
Edouard Grave
Armand Joulin
ArXiv (abs)PDFHTML

Papers citing "Reducing Transformer Depth on Demand with Structured Dropout"

50 / 406 papers shown
Title
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning
  and Coding with LLMs
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs
Pranjal Aggarwal
Aman Madaan
Yiming Yang
Mausam
LRM
99
45
0
19 May 2023
LLM-Pruner: On the Structural Pruning of Large Language Models
LLM-Pruner: On the Structural Pruning of Large Language Models
Xinyin Ma
Gongfan Fang
Xinchao Wang
175
445
0
19 May 2023
Tailoring Instructions to Student's Learning Levels Boosts Knowledge
  Distillation
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
Yuxin Ren
Zi-Qi Zhong
Xingjian Shi
Yi Zhu
Chun Yuan
Mu Li
99
7
0
16 May 2023
Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text
  Sequence-to-Sequence Modeling
Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text Sequence-to-Sequence Modeling
Y. Zhu
Xuebing Yang
Yuanyuan Wu
Wensheng Zhang
MedIm
41
2
0
15 May 2023
The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder
  Models for More Efficient Code Classification
The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification
Anastasiia Grishina
Max Hort
Leon Moonen
62
6
0
08 May 2023
Transformer-based models and hardware acceleration analysis in
  autonomous driving: A survey
Transformer-based models and hardware acceleration analysis in autonomous driving: A survey
J. Zhong
Zheng Liu
Xiangshan Chen
ViT
82
17
0
21 Apr 2023
Eliciting Latent Predictions from Transformers with the Tuned Lens
Eliciting Latent Predictions from Transformers with the Tuned Lens
Nora Belrose
Zach Furman
Logan Smith
Danny Halawi
Igor V. Ostrovsky
Lev McKinney
Stella Biderman
Jacob Steinhardt
111
231
0
14 Mar 2023
I3D: Transformer architectures with input-dependent dynamic depth for
  speech recognition
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Yifan Peng
Jaesong Lee
Shinji Watanabe
67
25
0
14 Mar 2023
Fine-tuning Strategies for Faster Inference using Speech Self-Supervised
  Models: A Comparative Study
Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study
Salah Zaiem
Robin Algayres
Titouan Parcollet
S. Essid
Mirco Ravanelli
106
15
0
12 Mar 2023
X-Pruner: eXplainable Pruning for Vision Transformers
X-Pruner: eXplainable Pruning for Vision Transformers
Lu Yu
Wei Xiang
ViT
60
50
0
08 Mar 2023
Gradient-Free Structured Pruning with Unlabeled Data
Gradient-Free Structured Pruning with Unlabeled Data
Azade Nova
H. Dai
Dale Schuurmans
SyDa
84
22
0
07 Mar 2023
BPT: Binary Point Cloud Transformer for Place Recognition
BPT: Binary Point Cloud Transformer for Place Recognition
Zhixing Hou
Yuzhang Shang
Tian Gao
Yan Yan
MQViT
71
3
0
02 Mar 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
163
106
0
27 Feb 2023
Towards multi-task learning of speech and speaker recognition
Towards multi-task learning of speech and speaker recognition
Nik Vaessen
David A. van Leeuwen
CVBM
24
0
0
24 Feb 2023
Speculative Decoding with Big Little Decoder
Speculative Decoding with Big Little Decoder
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
MoE
147
112
0
15 Feb 2023
Stitchable Neural Networks
Stitchable Neural Networks
Zizheng Pan
Jianfei Cai
Bohan Zhuang
102
25
0
13 Feb 2023
Revisiting Offline Compression: Going Beyond Factorization-based Methods
  for Transformer Language Models
Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language Models
Mohammadreza Banaei
Klaudia Bałazy
Artur Kasymov
R. Lebret
Jacek Tabor
Karl Aberer
OffRL
45
0
0
08 Feb 2023
ZipLM: Inference-Aware Structured Pruning of Language Models
ZipLM: Inference-Aware Structured Pruning of Language Models
Eldar Kurtic
Elias Frantar
Dan Alistarh
MQ
101
26
0
07 Feb 2023
Towards energy-efficient Deep Learning: An overview of energy-efficient
  approaches along the Deep Learning Lifecycle
Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle
Vanessa Mehlin
Sigurd Schacht
Carsten Lanquillon
HAIMedIm
127
20
0
05 Feb 2023
UPop: Unified and Progressive Pruning for Compressing Vision-Language
  Transformers
UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Ying Jin
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLMViT
116
39
0
31 Jan 2023
Exploring Attention Map Reuse for Efficient Transformer Neural Networks
Exploring Attention Map Reuse for Efficient Transformer Neural Networks
Kyuhong Shim
Jungwook Choi
Wonyong Sung
ViT
50
3
0
29 Jan 2023
Understanding INT4 Quantization for Transformer Models: Latency Speedup,
  Composability, and Failure Cases
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Xiaoxia Wu
Cheng-rong Li
Reza Yazdani Aminabadi
Z. Yao
Yuxiong He
MQ
74
25
0
27 Jan 2023
When Layers Play the Lottery, all Tickets Win at Initialization
When Layers Play the Lottery, all Tickets Win at Initialization
Artur Jordão
George Correa de Araujo
H. Maia
Hélio Pedrini
57
4
0
25 Jan 2023
Adapting a Language Model While Preserving its General Knowledge
Adapting a Language Model While Preserving its General Knowledge
Zixuan Ke
Yijia Shao
Haowei Lin
Hu Xu
Lei Shu
Bin Liu
KELMCLLVLM
58
21
0
21 Jan 2023
FlexiViT: One Model for All Patch Sizes
FlexiViT: One Model for All Patch Sizes
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim Alabdulmohsin
Filip Pavetić
VLM
153
94
0
15 Dec 2022
Gradient-based Intra-attention Pruning on Pre-trained Language Models
Gradient-based Intra-attention Pruning on Pre-trained Language Models
Ziqing Yang
Yiming Cui
Xin Yao
Shijin Wang
VLM
71
12
0
15 Dec 2022
Co-training $2^L$ Submodels for Visual Recognition
Co-training 2L2^L2L Submodels for Visual Recognition
Hugo Touvron
Matthieu Cord
Maxime Oquab
Piotr Bojanowski
Jakob Verbeek
Hervé Jégou
VLM
72
10
0
09 Dec 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative
  Latent Attention
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
77
9
0
21 Nov 2022
MelHuBERT: A simplified HuBERT on Mel spectrograms
MelHuBERT: A simplified HuBERT on Mel spectrograms
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
SSL
92
16
0
17 Nov 2022
Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers
Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers
Tzu-Quan Lin
Tsung-Huan Yang
Chun-Yao Chang
Kuang-Ming Chen
Tzu-hsun Feng
Hung-yi Lee
Hao Tang
84
6
0
17 Nov 2022
Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight
  BERT
Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight BERT
Siyuan Lu
Chenchen Zhou
Keli Xie
Jun Lin
Zhongfeng Wang
42
1
0
16 Nov 2022
A Survey for Efficient Open Domain Question Answering
A Survey for Efficient Open Domain Question Answering
Qin Zhang
Shan Chen
Dongkuan Xu
Qingqing Cao
Xiaojun Chen
Trevor Cohn
Meng Fang
90
36
0
15 Nov 2022
FPT: Improving Prompt Tuning Efficiency via Progressive Training
FPT: Improving Prompt Tuning Efficiency via Progressive Training
Yufei Huang
Yujia Qin
Huadong Wang
Yichun Yin
Maosong Sun
Zhiyuan Liu
Qun Liu
VLMLRM
61
6
0
13 Nov 2022
Speech-to-Speech Translation For A Real-world Unwritten Language
Speech-to-Speech Translation For A Real-world Unwritten Language
Peng-Jen Chen
Ke M. Tran
Yilin Yang
Jingfei Du
Justine T. Kao
...
Sravya Popuri
Changhan Wang
J. Pino
Wei-Ning Hsu
Ann Lee
93
26
0
11 Nov 2022
Bridging Fairness and Environmental Sustainability in Natural Language
  Processing
Bridging Fairness and Environmental Sustainability in Natural Language Processing
Marius Hessenthaler
Emma Strubell
Dirk Hovy
Anne Lauscher
92
8
0
08 Nov 2022
Streaming, fast and accurate on-device Inverse Text Normalization for
  Automatic Speech Recognition
Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Yashesh Gaur
Nick Kibre
Jian Xue
Kangyuan Shu
Yuhui Wang
Issac Alphonso
Jinyu Li
Jiawei Liu
34
7
0
07 Nov 2022
More Speaking or More Speakers?
More Speaking or More Speakers?
Dan Berrebbi
R. Collobert
Navdeep Jaitly
Tatiana Likhomanenko
49
6
0
02 Nov 2022
Empirical Evaluation of Post-Training Quantization Methods for Language
  Tasks
Empirical Evaluation of Post-Training Quantization Methods for Language Tasks
Ting Hu
Christoph Meinel
Haojin Yang
MQ
96
3
0
29 Oct 2022
Efficient Speech Translation with Dynamic Latent Perceivers
Efficient Speech Translation with Dynamic Latent Perceivers
Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa-jussá
54
3
0
28 Oct 2022
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency
  with Slenderized Multi-exit Language Models
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models
Bowen Shen
Zheng Lin
Yuanxin Liu
Zhengxiao Liu
Lei Wang
Weiping Wang
VLM
77
5
0
27 Oct 2022
Real-time Speech Interruption Analysis: From Cloud to Client Deployment
Real-time Speech Interruption Analysis: From Cloud to Client Deployment
Quchen Fu
Szu-Wei Fu
Yaran Fan
Yu-Huan Wu
Zhuo Chen
J. Gupchup
Ross Cutler
54
0
0
24 Oct 2022
PATS: Sensitivity-aware Noisy Learning for Pretrained Language Models
PATS: Sensitivity-aware Noisy Learning for Pretrained Language Models
Yupeng Zhang
Hongzhi Zhang
Sirui Wang
Wei Wu
Zhoujun Li
AAML
94
1
0
22 Oct 2022
Named Entity Detection and Injection for Direct Speech Translation
Named Entity Detection and Injection for Direct Speech Translation
Marco Gaido
Yun Tang
Ilia Kulikov
Rongqing Huang
Hongyu Gong
Hirofumi Inaguma
77
3
0
21 Oct 2022
Continuous Pseudo-Labeling from the Start
Continuous Pseudo-Labeling from the Start
Dan Berrebbi
R. Collobert
Samy Bengio
Navdeep Jaitly
Tatiana Likhomanenko
65
16
0
17 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
107
51
0
13 Oct 2022
Revisiting Structured Dropout
Revisiting Structured Dropout
Yiren Zhao
Oluwatomisin Dada
Xitong Gao
Robert D. Mullins
BDL
70
2
0
05 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDLLRM
386
1,101
0
05 Oct 2022
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
53
12
0
20 Sep 2022
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for
  End-to-End Speech Recognition
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Ye Bai
Jie Li
W. Han
Hao Ni
Kaituo Xu
Zhuo Zhang
Cheng Yi
Xiaorui Wang
MoE
58
2
0
17 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
156
114
0
31 Aug 2022
Previous
123456789
Next