Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.02861
Cited By
8-bit Optimizers via Block-wise Quantization
6 October 2021
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"8-bit Optimizers via Block-wise Quantization"
50 / 204 papers shown
Title
LLM2KB: Constructing Knowledge Bases using instruction tuned context aware Large Language Models
Anmol Nayak
Hariprasad Timmapathini
17
4
0
25 Aug 2023
CED: Consistent ensemble distillation for audio tagging
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
26
17
0
23 Aug 2023
ChatEDA: A Large Language Model Powered Autonomous Agent for EDA
Zhuolun He
Haoyuan Wu
Xinyun Zhang
Xufeng Yao
Su Zheng
Haisheng Zheng
Bei Yu
LLMAG
34
50
0
20 Aug 2023
A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology
Sean Wu
Michael Koo
L. Blum
A. Black
Liyo Kao
Fabien Scalzo
Ira Kurtz
LM&MA
ELM
AI4MH
18
42
0
09 Aug 2023
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning
Longteng Zhang
Lin Zhang
S. Shi
X. Chu
Bo-wen Li
AI4CE
18
91
0
07 Aug 2023
JourneyDB: A Benchmark for Generative Image Understanding
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Hongsheng Li
54
102
0
03 Jul 2023
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Guanhua Wang
Heyang Qin
S. A. Jacobs
Connor Holmes
Samyam Rajbhandari
Olatunji Ruwase
Feng Yan
Lei Yang
Yuxiong He
VLM
59
57
0
16 Jun 2023
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Kai Lv
Yuqing Yang
Tengxiao Liu
Qi-jie Gao
Qipeng Guo
Xipeng Qiu
45
126
0
16 Jun 2023
The Big Data Myth: Using Diffusion Models for Dataset Generation to Train Deep Detection Models
Roy Voetman
Maya Aghaei
K. Dijkstra
DiffM
19
11
0
16 Jun 2023
INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation
Yuji Chai
John Gkountouras
Glenn G. Ko
David Brooks
Gu-Yeon Wei
MQ
30
19
0
13 Jun 2023
Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
Baohao Liao
Shaomu Tan
Christof Monz
KELM
23
29
0
01 Jun 2023
Streaming Audio Transformers for Online Audio Tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
34
4
0
29 May 2023
Fine-Tuning Language Models with Just Forward Passes
Sadhika Malladi
Tianyu Gao
Eshaan Nichani
Alexandru Damian
Jason D. Lee
Danqi Chen
Sanjeev Arora
27
177
0
27 May 2023
HARD: Hard Augmentations for Robust Distillation
Arne F. Nix
Max F. Burg
Fabian H. Sinz
AAML
36
1
0
24 May 2023
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers
Artidoro Pagnoni
Ari Holtzman
Luke Zettlemoyer
ALM
43
2,342
0
23 May 2023
Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science
Yida Mu
Benze Wu
William Thorne
Ambrose Robinson
Nikolaos Aletras
Carolina Scarton
Kalina Bontcheva
Xingyi Song
21
18
0
23 May 2023
PiVe: Prompting with Iterative Verification Improving Graph-based Generative Capability of LLMs
Jiuzhou Han
Nigel Collier
Wray L. Buntine
Ehsan Shareghi
70
36
0
21 May 2023
LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM
Wenhui Hua
Brian Williams
Davood Shamsi
28
3
0
10 May 2023
Stable and low-precision training for large-scale vision-language models
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQ
MLLM
VLM
24
38
0
25 Apr 2023
Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT
Jiawei Zhang
LRM
42
76
0
10 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
Enhancing Large Language Models with Climate Resources
Mathias Kraus
J. Bingler
Markus Leippold
Tobias Schimanski
Chiara Colesanti-Senni
Dominik Stammbach
S. Vaghefi
Nicolas Webersinke
36
21
0
31 Mar 2023
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
76
786
0
30 Mar 2023
Operating critical machine learning models in resource constrained regimes
Raghavendra Selvan
Julian Schon
Erik Dam
MedIm
31
8
0
17 Mar 2023
Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
41
4
0
03 Mar 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks
Iker García-Ferrero
Rodrigo Agerri
German Rigau
38
13
0
20 Dec 2022
The case for 4-bit precision: k-bit Inference Scaling Laws
Tim Dettmers
Luke Zettlemoyer
MQ
21
214
0
19 Dec 2022
In Defense of Cross-Encoders for Zero-Shot Retrieval
G. Rosa
L. Bonifacio
Vitor Jeronymo
Hugo Queiroz Abonizio
Marzieh Fadaee
R. Lotufo
Rodrigo Nogueira
21
18
0
12 Dec 2022
HyperTuning: Toward Adapting Large Language Models without Back-propagation
Jason Phang
Yi Mao
Pengcheng He
Weizhu Chen
16
30
0
22 Nov 2022
Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training
Simla Burcu Harma
Canberk Sonmez
Nicholas Sperry
Babak Falsafi
Martin Jaggi
Yunho Oh
MQ
39
4
0
19 Nov 2022
How to Fine-Tune Vision Models with SGD
Ananya Kumar
Ruoqi Shen
Sébastien Bubeck
Suriya Gunasekar
VLM
8
29
0
17 Nov 2022
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
230
103
0
27 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
31
47
0
13 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
253
1,073
0
05 Oct 2022
PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically
Sedrick Scott Keh
Steven Y. Feng
Varun Gangal
Malihe Alikhani
Eduard H. Hovy
23
4
0
13 Sep 2022
Petals: Collaborative Inference and Fine-tuning of Large Models
Alexander Borzunov
Dmitry Baranchuk
Tim Dettmers
Max Ryabinin
Younes Belkada
Artem Chumachenko
Pavel Samygin
Colin Raffel
VLM
36
62
0
02 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
30
109
0
31 Aug 2022
Training a T5 Using Lab-sized Resources
Manuel R. Ciosici
Leon Derczynski
VLM
36
8
0
25 Aug 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
29
626
0
15 Aug 2022
Eco2AI: carbon emissions tracking of machine learning models as the first step towards sustainable AI
S. Budennyy
V. Lazarev
N. Zakharenko
A. Korovin
Olga Plosskaya
...
Ivan V. Oseledets
I. Barsola
Ilya M. Egorov
A. Kosterina
L. Zhukov
37
90
0
31 Jul 2022
Training Transformers Together
Alexander Borzunov
Max Ryabinin
Tim Dettmers
Quentin Lhoest
Lucile Saulnier
Michael Diskin
Yacine Jernite
Thomas Wolf
ViT
31
8
0
07 Jul 2022
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViT
MQ
26
251
0
06 Jun 2022
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan V. Oseledets
Olivier Beaumont
22
10
0
21 Feb 2022
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Barret Zoph
Irwan Bello
Sameer Kumar
Nan Du
Yanping Huang
J. Dean
Noam M. Shazeer
W. Fedus
MoE
24
181
0
17 Feb 2022
Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction
Georgii Sergeevich Novikov
Daniel Bershatsky
Julia Gusak
Alex Shonenkov
Denis Dimitrov
Ivan V. Oseledets
MQ
26
17
0
01 Feb 2022
Emojich -- zero-shot emoji generation using Russian language: a technical report
Alex Shonenkov
Daria Bakshandaeva
Denis Dimitrov
Aleks D. Nikolich
VLM
27
5
0
04 Dec 2021
MetaICL: Learning to Learn In Context
Sewon Min
M. Lewis
Luke Zettlemoyer
Hannaneh Hajishirzi
LRM
54
467
0
29 Oct 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,781
0
24 Feb 2021
BembaSpeech: A Speech Recognition Corpus for the Bemba Language
Claytone Sikasote
Antonios Anastasopoulos
9
21
0
09 Feb 2021
Previous
1
2
3
4
5
Next