Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.17296
Cited By
BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
25 June 2024
A. Ramesh
Vignesh Ganapathiraman
I. Laradji
Mark Schmidt
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks"
18 / 18 papers shown
Title
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu Zhang
Beidi Chen
Zhangyang Wang
A. Anandkumar
Yuandong Tian
95
213
0
06 Mar 2024
ReLoRA: High-Rank Training Through Low-Rank Updates
Vladislav Lialin
Namrata Shivagunde
Sherin Muckatira
Anna Rumshisky
BDL
79
115
0
11 Jul 2023
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models
Zhiqiang Hu
Lei Wang
Yihuai Lan
Wanyu Xu
Ee-Peng Lim
Lidong Bing
Xing Xu
Soujanya Poria
Roy Ka-wei Lee
ALM
102
264
0
04 Apr 2023
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
394
2,388
0
09 Nov 2022
Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning
Manas Gupta
Efe Camci
Vishandi Rudy Keneta
Abhishek Vaidyanathan
Ritwik Kanodia
Chuan-Sheng Foo
Wu Min
Lin Jie
53
14
0
29 Sep 2022
Exploring Low Rank Training of Deep Neural Networks
Siddhartha Rao Kamalakara
Acyr Locatelli
Bharat Venkitesh
Jimmy Ba
Y. Gal
Aidan Gomez
71
24
0
27 Sep 2022
Towards a Unified View of Parameter-Efficient Transfer Learning
Junxian He
Chunting Zhou
Xuezhe Ma
Taylor Berg-Kirkpatrick
Graham Neubig
AAML
129
937
0
08 Oct 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
477
10,367
0
17 Jun 2021
Lessons on Parameter Sharing across Layers in Transformers
Sho Takase
Shun Kiyono
53
87
0
13 Apr 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
608
4,893
0
23 Jan 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
445
20,298
0
23 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
234
7,520
0
02 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
120
595
0
25 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
665
24,528
0
26 Jul 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
103
1,062
0
25 May 2019
Greedy Layerwise Learning Can Scale to ImageNet
Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon
120
181
0
29 Dec 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,182
0
20 Apr 2018
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
261
8,854
0
01 Oct 2015
1