Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.03844
Cited By
On the Effect of Dropping Layers of Pre-trained Transformer Models
8 April 2020
Hassan Sajjad
Fahim Dalvi
Nadir Durrani
Preslav Nakov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Effect of Dropping Layers of Pre-trained Transformer Models"
27 / 27 papers shown
Title
How Redundant Is the Transformer Stack in Speech Representation Models?
Teresa Dorszewski
Albert Kjøller Jacobsen
Lenka Tětková
Lars Kai Hansen
107
0
0
20 Jan 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
50
0
0
10 Jan 2025
FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model
Feijie Wu
Zitao Li
Yaliang Li
Bolin Ding
Jing Gao
34
41
0
25 Jun 2024
Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent
Lin Wang
Zhichao Wang
Xiaoying Tang
45
1
0
17 Jun 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Wei Zhong
Manasa Bharadwaj
47
5
0
30 May 2024
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
46
79
0
26 Mar 2024
Where does In-context Translation Happen in Large Language Models
Suzanna Sia
David Mueller
Kevin Duh
LRM
41
0
0
07 Mar 2024
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers
Shuzhou Yuan
Ercong Nie
Bolei Ma
Michael Farber
42
3
0
18 Feb 2024
Graph Neural Networks for Antisocial Behavior Detection on Twitter
Martina Toshevska
S. Kalajdziski
Sonja Gievska
19
0
0
28 Dec 2023
FedPEAT: Convergence of Federated Learning, Parameter-Efficient Fine Tuning, and Emulator Assisted Tuning for Artificial Intelligence Foundation Models with Mobile Edge Computing
Terence Jie Chua
Wen-li Yu
Junfeng Zhao
Kwok-Yan Lam
FedML
26
5
0
26 Oct 2023
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models
Seungcheol Park
Ho-Jin Choi
U. Kang
VLM
37
5
0
07 Aug 2023
Deep Model Compression Also Helps Models Capture Ambiguity
Hancheol Park
Jong C. Park
27
1
0
12 Jun 2023
The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification
Anastasiia Grishina
Max Hort
Leon Moonen
22
6
0
08 May 2023
Gradient-Free Structured Pruning with Unlabeled Data
Azade Nova
H. Dai
Dale Schuurmans
SyDa
32
20
0
07 Mar 2023
Tracing and Manipulating Intermediate Values in Neural Math Problem Solvers
Yuta Matsumoto
Benjamin Heinzerling
Masashi Yoshikawa
Kentaro Inui
AIFin
17
5
0
17 Jan 2023
On the Transformation of Latent Space in Fine-Tuned NLP Models
Nadir Durrani
Hassan Sajjad
Fahim Dalvi
Firoj Alam
32
18
0
23 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning
Shuo Xie
Jiahao Qiu
Ankita Pasad
Li Du
Qing Qu
Hongyuan Mei
35
16
0
18 Oct 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
Differentiable Subset Pruning of Transformer Heads
Jiaoda Li
Ryan Cotterell
Mrinmaya Sachan
37
53
0
10 Aug 2021
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
9
145
0
02 Jul 2021
What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis
Shammur A. Chowdhury
Nadir Durrani
Ahmed M. Ali
36
12
0
01 Jul 2021
Comparing Rewinding and Fine-tuning in Neural Network Pruning
Alex Renda
Jonathan Frankle
Michael Carbin
224
383
0
05 Mar 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
221
197
0
07 Feb 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
233
576
0
12 Sep 2019
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives
Elena Voita
Rico Sennrich
Ivan Titov
198
181
0
03 Sep 2019
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Alexis Conneau
Germán Kruszewski
Guillaume Lample
Loïc Barrault
Marco Baroni
201
882
0
03 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1