ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11556
  4. Cited By
Reducing Transformer Depth on Demand with Structured Dropout

Reducing Transformer Depth on Demand with Structured Dropout

25 September 2019
Angela Fan
Edouard Grave
Armand Joulin
ArXiv (abs)PDFHTML

Papers citing "Reducing Transformer Depth on Demand with Structured Dropout"

50 / 406 papers shown
Title
FLORA: Fine-grained Low-Rank Architecture Search for Vision Transformer
FLORA: Fine-grained Low-Rank Architecture Search for Vision Transformer
Chi-Chih Chang
Yuan-Yao Sung
Shixing Yu
N. Huang
Diana Marculescu
Kai-Chiang Wu
ViT
58
1
0
07 Nov 2023
Improving Machine Translation with Large Language Models: A Preliminary
  Study with Cooperative Decoding
Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding
Jiali Zeng
Fandong Meng
Yongjing Yin
Jie Zhou
88
11
0
06 Nov 2023
TLM: Token-Level Masking for Transformers
TLM: Token-Level Masking for Transformers
Yangjun Wu
Kebin Fang
Dongxian Zhang
Han Wang
Hao Zhang
Gang Chen
56
1
0
28 Oct 2023
Switching Temporary Teachers for Semi-Supervised Semantic Segmentation
Switching Temporary Teachers for Semi-Supervised Semantic Segmentation
Jaemin Na
Jung-Woo Ha
HyungJin Chang
Dongyoon Han
Wonjun Hwang
93
33
0
28 Oct 2023
Variator: Accelerating Pre-trained Models with Plug-and-Play Compression
  Modules
Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules
Chaojun Xiao
Yuqi Luo
Wenbin Zhang
Pengle Zhang
Xu Han
...
Zhengyan Zhang
Ruobing Xie
Zhiyuan Liu
Maosong Sun
Jie Zhou
65
1
0
24 Oct 2023
CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without
  Full Large Language Model
CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model
Kaiyan Zhang
Ning Ding
Biqing Qi
Xuekai Zhu
Xinwei Long
Bowen Zhou
95
5
0
24 Oct 2023
Sub-network Discovery and Soft-masking for Continual Learning of Mixed
  Tasks
Sub-network Discovery and Soft-masking for Continual Learning of Mixed Tasks
Zixuan Ke
Bing Liu
Wenhan Xiong
Asli Celikyilmaz
Haoran Li
CLL
75
7
0
13 Oct 2023
A Comparative Analysis of Task-Agnostic Distillation Methods for
  Compressing Transformer Language Models
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models
Takuma Udagawa
Aashka Trivedi
Michele Merler
Bishwaranjan Bhattacharjee
78
7
0
13 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
102
0
0
11 Oct 2023
Can pruning make Large Language Models more efficient?
Can pruning make Large Language Models more efficient?
Sia Gholami
Marwan Omar
92
13
0
06 Oct 2023
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Andrew Rouditchenko
R. Collobert
Tatiana Likhomanenko
VLM
88
3
0
29 Sep 2023
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping
Martin Pelikan
Sheikh Shams Azam
Vitaly Feldman
Jan Honza Silovsky
Kunal Talwar
Christopher G. Brinton
Tatiana Likhomanenko
101
8
0
29 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
107
17
0
28 Sep 2023
CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning
CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning
W. Liu
Zhiyuan Peng
Tan Lee
57
2
0
21 Sep 2023
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large
  Language Models for Dynamic Inference
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference
Parsa Kavehzadeh
Mojtaba Valipour
Marzieh S. Tahaei
Ali Ghodsi
Boxing Chen
Mehdi Rezagholizadeh
87
6
0
16 Sep 2023
Neurons in Large Language Models: Dead, N-gram, Positional
Neurons in Large Language Models: Dead, N-gram, Positional
Elena Voita
Javier Ferrando
Christoforos Nalmpantis
MILM
166
56
0
09 Sep 2023
Enhancing Deep Learning Models through Tensorization: A Comprehensive
  Survey and Framework
Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework
Manal Helal
68
0
0
05 Sep 2023
Parameter and Computation Efficient Transfer Learning for
  Vision-Language Pre-trained Models
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Qiong Wu
Wei Yu
Yiyi Zhou
Shubin Huang
Xiaoshuai Sun
Rongrong Ji
VLM
86
7
0
04 Sep 2023
SortedNet: A Scalable and Generalized Framework for Training Modular
  Deep Neural Networks
SortedNet: A Scalable and Generalized Framework for Training Modular Deep Neural Networks
Mojtaba Valipour
Mehdi Rezagholizadeh
Hossein Rajabzadeh
Parsa Kavehzadeh
Marzieh S. Tahaei
Boxing Chen
Ali Ghodsi
43
1
0
01 Sep 2023
$\rm SP^3$: Enhancing Structured Pruning via PCA Projection
SP3\rm SP^3SP3: Enhancing Structured Pruning via PCA Projection
Yuxuan Hu
Jing Zhang
Zhe Zhao
Chengliang Zhao
Xiaodong Chen
Cuiping Li
Hong Chen
61
3
0
31 Aug 2023
Discrete Prompt Compression with Reinforcement Learning
Discrete Prompt Compression with Reinforcement Learning
Hoyoun Jung
Kyung-Joong Kim
101
29
0
17 Aug 2023
DPBERT: Efficient Inference for BERT based on Dynamic Planning
DPBERT: Efficient Inference for BERT based on Dynamic Planning
Weixin Wu
H. Zhuo
18
0
0
26 Jul 2023
Gradient Sparsification For Masked Fine-Tuning of Transformers
Gradient Sparsification For Masked Fine-Tuning of Transformers
J. Ó. Neill
Sourav Dutta
49
0
0
19 Jul 2023
A Survey of Techniques for Optimizing Transformer Inference
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
125
74
0
16 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
110
45
0
12 Jul 2023
Self-Distilled Quantization: Achieving High Compression Rates in
  Transformer-Based Language Models
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
James OÑeill
Sourav Dutta
VLMMQ
69
1
0
12 Jul 2023
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM
  Decoding
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding
Seongjun Yang
Gibbeum Lee
Jaewoong Cho
Dimitris Papailiopoulos
Kangwook Lee
93
38
0
12 Jul 2023
Learning to Group Auxiliary Datasets for Molecule
Learning to Group Auxiliary Datasets for Molecule
Ting Huang
Ziniu Hu
Rex Ying
61
0
0
08 Jul 2023
When Does Confidence-Based Cascade Deferral Suffice?
When Does Confidence-Based Cascade Deferral Suffice?
Wittawat Jitkrittum
Neha Gupta
A. Menon
Harikrishna Narasimhan
A. S. Rawat
Surinder Kumar
53
23
0
06 Jul 2023
Training Transformers with 4-bit Integers
Training Transformers with 4-bit Integers
Haocheng Xi
Changhao Li
Jianfei Chen
Jun Zhu
MQ
116
52
0
21 Jun 2023
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
173
440
0
20 Jun 2023
LoSparse: Structured Compression of Large Language Models based on
  Low-Rank and Sparse Approximation
LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
Yixiao Li
Yifan Yu
Qingru Zhang
Chen Liang
Pengcheng He
Weizhu Chen
Tuo Zhao
120
76
0
20 Jun 2023
MIR-GAN: Refining Frame-Level Modality-Invariant Representations with
  Adversarial Network for Audio-Visual Speech Recognition
MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition
Yuchen Hu
Chen Chen
Ruizhe Li
Heqing Zou
Chng Eng Siong
GAN
113
9
0
18 Jun 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for
  Robust Audio-Visual Speech Recognition
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
Eng Siong Chng
120
5
0
18 Jun 2023
SqueezeLLM: Dense-and-Sparse Quantization
SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim
Coleman Hooper
A. Gholami
Zhen Dong
Xiuyu Li
Sheng Shen
Michael W. Mahoney
Kurt Keutzer
MQ
150
198
0
13 Jun 2023
Revisiting Token Pruning for Object Detection and Instance Segmentation
Revisiting Token Pruning for Object Detection and Instance Segmentation
Yifei Liu
Mathias Gehrig
Nico Messikommer
Marco Cannici
Davide Scaramuzza
ViTVLM
112
27
0
12 Jun 2023
Query Encoder Distillation via Embedding Alignment is a Strong Baseline
  Method to Boost Dense Retriever Online Efficiency
Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency
Yuxuan Wang
Hong Lyu
73
2
0
05 Jun 2023
Modular Transformers: Compressing Transformers into Modularized Layers
  for Flexible Efficient Inference
Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference
Wangchunshu Zhou
Ronan Le Bras
Yejin Choi
56
1
0
04 Jun 2023
The Information Pathways Hypothesis: Transformers are Dynamic
  Self-Ensembles
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
168
3
0
02 Jun 2023
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech
  Models
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
73
43
0
28 May 2023
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating
  Vision-Language Transformers
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Anyi Rao
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLM
133
23
0
27 May 2023
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Qihuang Zhong
Liang Ding
Juhua Liu
Xuebo Liu
Min Zhang
Bo Du
Dacheng Tao
VLM
73
10
0
24 May 2023
Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model
  Fine-tuning
Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning
Zhen-Ru Zhang
Chuanqi Tan
Haiyang Xu
Chengyu Wang
Jun Huang
Songfang Huang
73
38
0
24 May 2023
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient
  Vision-Language Models
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
Zekun Wang
Jingchang Chen
Wangchunshu Zhou
Haichao Zhu
Jiafeng Liang
Liping Shan
Ming Liu
Dongliang Xu
Qing Yang
Bing Qin
VLM
87
5
0
24 May 2023
Just CHOP: Embarrassingly Simple LLM Compression
Just CHOP: Embarrassingly Simple LLM Compression
A. Jha
Tom Sherborne
Evan Pete Walsh
Dirk Groeneveld
Emma Strubell
Iz Beltagy
107
3
0
24 May 2023
PruMUX: Augmenting Data Multiplexing with Model Compression
PruMUX: Augmenting Data Multiplexing with Model Compression
Yushan Su
Vishvak Murahari
Karthik Narasimhan
Keqin Li
66
3
0
24 May 2023
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Ziwei He
Meng Yang
Minwei Feng
Jingcheng Yin
Xiang Wang
Jingwen Leng
Zhouhan Lin
ViT
95
14
0
24 May 2023
One-stop Training of Multiple Capacity Models
One-stop Training of Multiple Capacity Models
Lan Jiang
Haoyang Huang
Dongdong Zhang
R. Jiang
Furu Wei
111
0
0
23 May 2023
Infor-Coef: Information Bottleneck-based Dynamic Token Downsampling for
  Compact and Efficient language model
Infor-Coef: Information Bottleneck-based Dynamic Token Downsampling for Compact and Efficient language model
Wenxin Tan
50
1
0
21 May 2023
F-PABEE: Flexible-patience-based Early Exiting for Single-label and
  Multi-label text Classification Tasks
F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text Classification Tasks
Xiangxiang Gao
Wei-wei Zhu
Jiasheng Gao
Congrui Yin
VLM
92
12
0
21 May 2023
Previous
123456789
Next