Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.12394
Cited By
Pruning Pre-trained Language Models with Principled Importance and Self-regularization
21 May 2023
Siyu Ren
Kenny Q. Zhu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Pruning Pre-trained Language Models with Principled Importance and Self-regularization"
21 / 21 papers shown
Title
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Qingru Zhang
Simiao Zuo
Chen Liang
Alexander Bukharin
Pengcheng He
Weizhu Chen
T. Zhao
78
80
0
25 Jun 2022
Parameter-Efficient Sparsity for Large Language Models Fine-Tuning
Yuchao Li
Fuli Luo
Chuanqi Tan
Mengdi Wang
Songfang Huang
Shen Li
Junjie Bai
MQ
119
34
0
23 May 2022
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization
Chen Liang
Simiao Zuo
Minshuo Chen
Haoming Jiang
Xiaodong Liu
Pengcheng He
T. Zhao
Weizhu Chen
54
69
0
25 May 2021
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Armen Aghajanyan
Luke Zettlemoyer
Sonal Gupta
110
571
1
22 Dec 2020
DART: Open-Domain Structured Data Record to Text Generation
Linyong Nan
Dragomir R. Radev
Rui Zhang
Amrit Rau
Abhinand Sivaprasad
...
Y. Tan
Xi Lin
Caiming Xiong
R. Socher
Nazneen Rajani
60
201
0
06 Jul 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
79
487
0
15 May 2020
Comparing Rewinding and Fine-tuning in Neural Network Pruning
Alex Renda
Jonathan Frankle
Michael Carbin
304
388
0
05 Mar 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
188
1,284
0
25 Feb 2020
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
375
6,472
0
26 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
149
843
0
25 Aug 2019
Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge
Ondrej Dusek
Jekaterina Novikova
Verena Rieser
ELM
103
233
0
23 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,324
0
11 Oct 2018
SNIP: Single-shot Network Pruning based on Connection Sensitivity
Namhoon Lee
Thalaiyasingam Ajanthan
Philip Torr
VLM
274
1,212
0
04 Oct 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,201
0
20 Apr 2018
Stronger generalization bounds for deep nets via a compression approach
Sanjeev Arora
Rong Ge
Behnam Neyshabur
Yi Zhang
MLT
AI4CE
100
643
0
14 Feb 2018
Learning Sparse Neural Networks through
L
0
L_0
L
0
Regularization
Christos Louizos
Max Welling
Diederik P. Kingma
444
1,148
0
04 Dec 2017
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
154
2,158
0
14 Nov 2017
To prune, or not to prune: exploring the efficacy of pruning for model compression
Michael Zhu
Suyog Gupta
202
1,282
0
05 Oct 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
819
132,725
0
12 Jun 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
318
8,177
0
16 Jun 2016
Unifying distillation and privileged information
David Lopez-Paz
Léon Bottou
Bernhard Schölkopf
V. Vapnik
FedML
178
463
0
11 Nov 2015
1