Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.04124
Cited By
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression
8 April 2020
Yihuan Mao
Yujing Wang
Chufan Wu
Chen Zhang
Yang-Feng Wang
Yaming Yang
Quanlu Zhang
Yunhai Tong
Jing Bai
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression"
17 / 17 papers shown
Title
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
41
0
0
08 Jan 2025
DSFormer: Effective Compression of Text-Transformers by Dense-Sparse Weight Factorization
Rahul Chand
Yashoteja Prabhu
Pratyush Kumar
20
3
0
20 Dec 2023
Matrix Compression via Randomized Low Rank and Low Precision Factorization
R. Saha
Varun Srivastava
Mert Pilanci
26
19
0
17 Oct 2023
Training Large Language Models Efficiently with Sparsity and Dataflow
V. Srinivasan
Darshan Gandhi
Urmish Thakker
R. Prabhakar
MoE
33
6
0
11 Apr 2023
Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language Models
Mohammadreza Banaei
Klaudia Bałazy
Artur Kasymov
R. Lebret
Jacek Tabor
Karl Aberer
OffRL
21
0
0
08 Feb 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
23
6
0
03 Feb 2023
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
82
31
0
14 Sep 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
50
442
0
04 Jun 2022
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Connor Holmes
Minjia Zhang
Yuxiong He
Bo Wu
31
18
0
28 Oct 2021
BERMo: What can BERT learn from ELMo?
Sangamesh Kodge
Kaushik Roy
38
3
0
18 Oct 2021
KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation
Marzieh S. Tahaei
Ella Charlaix
V. Nia
A. Ghodsi
Mehdi Rezagholizadeh
46
22
0
13 Sep 2021
Block Pruning For Faster Transformers
François Lagunas
Ella Charlaix
Victor Sanh
Alexander M. Rush
VLM
18
218
0
10 Sep 2021
Utility is in the Eye of the User: A Critique of NLP Leaderboards
Kawin Ethayarajh
Dan Jurafsky
ELM
24
51
0
29 Sep 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
33
208
0
27 Sep 2020
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick
Hinrich Schütze
27
955
0
15 Sep 2020
Large scale distributed neural network training through online distillation
Rohan Anil
Gabriel Pereyra
Alexandre Passos
Róbert Ormándi
George E. Dahl
Geoffrey E. Hinton
FedML
278
404
0
09 Apr 2018
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
Aojun Zhou
Anbang Yao
Yiwen Guo
Lin Xu
Yurong Chen
MQ
337
1,049
0
10 Feb 2017
1