LadaBERT: Lightweight Adaptation of BERT through Hybrid Model
Compression

LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression

8 April 2020

Yujing Wang

Jing Bai

Papers citing "LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression"

17 / 17 papers shown

Title
CURing Large Models: Compression via CUR Decomposition Sanghyeon Park Soo-Mook Moon 41 0 0 08 Jan 2025
DSFormer: Effective Compression of Text-Transformers by Dense-Sparse Weight Factorization Rahul Chand Yashoteja Prabhu Pratyush Kumar 20 3 0 20 Dec 2023
Matrix Compression via Randomized Low Rank and Low Precision Factorization R. Saha Varun Srivastava Mert Pilanci 26 19 0 17 Oct 2023
Training Large Language Models Efficiently with Sparsity and Dataflow V. Srinivasan Darshan Gandhi Urmish Thakker R. Prabhakar MoE 33 6 0 11 Apr 2023
Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language Models Mohammadreza Banaei Klaudia Bałazy Artur Kasymov R. Lebret Jacek Tabor Karl Aberer OffRL 21 0 0 08 Feb 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective Jongwoo Ko Seungjoon Park Minchan Jeong S. Hong Euijai Ahn Duhyeuk Chang Se-Young Yun 23 6 0 03 Feb 2023
Efficient Quantized Sparse Matrix Operations on Tensor Cores Shigang Li Kazuki Osawa Torsten Hoefler 82 31 0 14 Sep 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers Z. Yao Reza Yazdani Aminabadi Minjia Zhang Xiaoxia Wu Conglong Li Yuxiong He VLM MQ 50 442 0 04 Jun 2022
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM Connor Holmes Minjia Zhang Yuxiong He Bo Wu 31 18 0 28 Oct 2021
BERMo: What can BERT learn from ELMo? Sangamesh Kodge Kaushik Roy 38 3 0 18 Oct 2021
KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation Marzieh S. Tahaei Ella Charlaix V. Nia A. Ghodsi Mehdi Rezagholizadeh 46 22 0 13 Sep 2021
Block Pruning For Faster Transformers François Lagunas Ella Charlaix Victor Sanh Alexander M. Rush VLM 18 218 0 10 Sep 2021
Utility is in the Eye of the User: A Critique of NLP Leaderboards Kawin Ethayarajh Dan Jurafsky ELM 24 51 0 29 Sep 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT Wei Zhang Lu Hou Yichun Yin Lifeng Shang Xiao Chen Xin Jiang Qun Liu MQ 33 208 0 27 Sep 2020
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners Timo Schick Hinrich Schütze 27 955 0 15 Sep 2020
Large scale distributed neural network training through online distillation Rohan Anil Gabriel Pereyra Alexandre Passos Róbert Ormándi George E. Dahl Geoffrey E. Hinton FedML 278 404 0 09 Apr 2018
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights Aojun Zhou Anbang Yao Yiwen Guo Lin Xu Yurong Chen MQ 337 1,049 0 10 Feb 2017