Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.04732
Cited By
v1
v2 (latest)
Structured Pruning of Large Language Models
10 October 2019
Ziheng Wang
Jeremy Wohlwend
Tao Lei
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Structured Pruning of Large Language Models"
50 / 60 papers shown
Title
AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer
Jiquan Shan
Junxiao Wang
Lifeng Zhao
Liang Cai
Hongyuan Zhang
Ioannis Liritzis
ViT
230
0
0
22 May 2025
Accelerating Clinical NLP at Scale with a Hybrid Framework with Reduced GPU Demands: A Case Study in Dementia Identification
Jianlin Shi
Qiwei Gan
Elizabeth Hanchrow
Annie Bowles
John Stanley
Adam P. Bress
Jordana B. Cohen
Patrick R. Alba
57
0
0
16 Apr 2025
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
178
6
0
10 Feb 2025
Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop
Ekaterina Artemova
Akim Tsvigun
Dominik Schlechtweg
Natalia Fedorova
Konstantin Chernyshev
Sergei Tilga
Boris Obmoroshev
SyDa
VLM
420
0
0
28 Jan 2025
Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific Prompts
Danyal Aftab
Steven Davy
ALM
88
1
0
10 Jan 2025
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
70
1
0
08 Jan 2025
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Zehua Pei
Hui-Ling Zhen
Xianzhi Yu
Sinno Jialin Pan
Mingxuan Yuan
Bei Yu
AI4CE
227
3
0
21 Nov 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
140
13
0
19 Aug 2024
Finding Transformer Circuits with Edge Pruning
Adithya Bhaskar
Alexander Wettig
Dan Friedman
Danqi Chen
191
20
0
24 Jun 2024
Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes
Lucio Dery
Steven Kolawole
Jean-Francois Kagey
Virginia Smith
Graham Neubig
Ameet Talwalkar
77
36
0
08 Feb 2024
Neural Network Distiller: A Python Package For DNN Compression Research
Neta Zmora
Guy Jacob
Lev Zlotnik
Bar Elharar
Gal Novik
50
74
0
27 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
255
7,547
0
02 Oct 2019
Optimizing Speech Recognition For The Edge
Yuan Shangguan
Jian Li
Qiao Liang
R. Álvarez
Ian McGraw
68
64
0
26 Sep 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
373
6,467
0
26 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
120
596
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
113
1,869
0
23 Sep 2019
Transformer to CNN: Label-scarce distillation for efficient text classification
Yew Ken Chia
Sam Witteveen
Martin Andrews
41
37
0
08 Sep 2019
Small and Practical BERT Models for Sequence Labeling
Henry Tsai
Jason Riesa
Melvin Johnson
N. Arivazhagan
Xin Li
Amelia Archer
VLM
60
121
0
31 Aug 2019
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
140
843
0
25 Aug 2019
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
Iulia Turc
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
65
225
0
23 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
677
24,541
0
26 Jul 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
117
1,146
0
23 May 2019
Interpretable Neural Predictions with Differentiable Binary Variables
Jasmijn Bastings
Wilker Aziz
Ivan Titov
82
214
0
20 May 2019
ERNIE: Enhanced Language Representation with Informative Entities
Zhengyan Zhang
Xu Han
Zhiyuan Liu
Xin Jiang
Maosong Sun
Qun Liu
109
1,400
0
17 May 2019
The State of Sparsity in Deep Neural Networks
Trevor Gale
Erich Elsen
Sara Hooker
163
762
0
25 Feb 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
257
3,745
0
09 Jan 2019
WEST: Word Encoded Sequence Transducers
Ehsan Variani
A. Suresh
M. Weintraub
41
9
0
20 Nov 2018
Balanced Sparsity for Efficient DNN Inference on GPU
Zhuliang Yao
Shijie Cao
Wencong Xiao
Chen Zhang
Lanshun Nie
57
93
0
01 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,175
0
11 Oct 2018
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
106
390
0
28 Sep 2018
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
Liyuan Liu
Xiang Ren
Jingbo Shang
Jian-wei Peng
Jiawei Han
78
44
0
20 Apr 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,196
0
20 Apr 2018
An Analysis of Neural Language Modeling at Multiple Scales
Stephen Merity
N. Keskar
R. Socher
59
171
0
22 Mar 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
261
3,485
0
09 Mar 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
233
11,566
0
15 Feb 2018
AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Yihui He
Ji Lin
Zhijian Liu
Hanrui Wang
Li Li
Song Han
100
1,349
0
10 Feb 2018
Learning Sparse Neural Networks through
L
0
L_0
L
0
Regularization
Christos Louizos
Max Welling
Diederik P. Kingma
436
1,147
0
04 Dec 2017
Slim Embedding Layers for Recurrent Neural Language Models
Zhongliang Li
Raymond Kulhanek
Shaojun Wang
Yunxin Zhao
Shuang Wu
KELM
57
23
0
27 Nov 2017
CondenseNet: An Efficient DenseNet using Learned Group Convolutions
Gao Huang
Shichen Liu
Laurens van der Maaten
Kilian Q. Weinberger
105
798
0
25 Nov 2017
Block-Sparse Recurrent Neural Networks
Sharan Narang
Eric Undersander
G. Diamos
54
138
0
08 Nov 2017
To prune, or not to prune: exploring the efficacy of pruning for model compression
Michael Zhu
Suyog Gupta
197
1,281
0
05 Oct 2017
Learning Intrinsic Sparse Structures within Long Short-Term Memory
W. Wen
Yuxiong He
Samyam Rajbhandari
Minjia Zhang
Wenhan Wang
Fang Liu
Bin Hu
Yiran Chen
H. Li
MQ
81
142
0
15 Sep 2017
Simple Recurrent Units for Highly Parallelizable Recurrence
Tao Lei
Yu Zhang
Sida I. Wang
Huijing Dai
Yoav Artzi
LRM
117
276
0
08 Sep 2017
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation
Daniel Cer
Mona T. Diab
Eneko Agirre
I. Lopez-Gazpio
Lucia Specia
443
1,891
0
31 Jul 2017
On the State of the Art of Evaluation in Neural Language Models
Gábor Melis
Chris Dyer
Phil Blunsom
68
536
0
18 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
786
132,363
0
12 Jun 2017
Bayesian Compression for Deep Learning
Christos Louizos
Karen Ullrich
Max Welling
UQCV
BDL
166
481
0
24 May 2017
Exploring Sparsity in Recurrent Neural Networks
Sharan Narang
Erich Elsen
G. Diamos
Shubho Sengupta
43
312
0
17 Apr 2017
Variational Dropout Sparsifies Deep Neural Networks
Dmitry Molchanov
Arsenii Ashukha
Dmitry Vetrov
BDL
150
831
0
19 Jan 2017
Trained Ternary Quantization
Chenzhuo Zhu
Song Han
Huizi Mao
W. Dally
MQ
139
1,036
0
04 Dec 2016
1
2
Next