Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.02861
Cited By
8-bit Optimizers via Block-wise Quantization
6 October 2021
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"8-bit Optimizers via Block-wise Quantization"
50 / 203 papers shown
Title
GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge Subtraction
Mohammadtaha Bagherifard
Sahar Rajabi
Ali Edalat
Yadollah Yaghoobzadeh
KELM
17
0
0
16 May 2025
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
Jinuk Kim
Marwa El Halabi
W. Park
Clemens JS Schaefer
Deokjae Lee
Yeonhong Park
Jae W. Lee
Hyun Oh Song
MQ
29
0
0
11 May 2025
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
Patrick Blumenberg
Thomas Graave
Tim Fingscheidt
MQ
19
0
0
10 May 2025
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
Jake Grigsby
Yuke Zhu
Michael S Ryoo
Juan Carlos Niebles
OffRL
VLM
41
0
0
06 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
K. Zhang
Lizhuang Ma
J. Wang
J. Wang
W. Zhang
MQ
57
0
0
01 May 2025
ViClaim: A Multilingual Multilabel Dataset for Automatic Claim Detection in Videos
Patrick Giedemann
Pius von Daniken
Jan Deriu
Álvaro Rodrigo
Anselmo Peñas
Mark Cieliebak
32
0
0
17 Apr 2025
LLM4Ranking: An Easy-to-use Framework of Utilizing Large Language Models for Document Reranking
Qi Liu
Haozhe Duan
Yiqun Chen
Quanfeng Lu
Weiwei Sun
Jiaxin Mao
27
0
0
10 Apr 2025
PoGO: A Scalable Proof of Useful Work via Quantized Gradient Descent and Merkle Proofs
José I. Orlicki
29
0
0
10 Apr 2025
Gaussian Mixture Flow Matching Models
Hansheng Chen
Kai Zhang
Hao Tan
Zexiang Xu
Fujun Luan
Leonidas J. Guibas
Gordon Wetzstein
Sai Bi
DiffM
63
0
0
07 Apr 2025
Benchmarking Systematic Relational Reasoning with Large Language and Reasoning Models
Irtaza Khalid
Amir Masoud Nourollah
Steven Schockaert
LRM
40
0
0
30 Mar 2025
Cyborg Data: Merging Human with AI Generated Training Data
Kai North
Christopher Ormerod
37
0
0
26 Mar 2025
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Yaxiong Chen
Yujie Wang
Zixuan Zheng
Jingliang Hu
Yilei Shi
Shengwu Xiong
Xiao Xiang Zhu
Lichao Mou
54
0
0
18 Mar 2025
Accurate INT8 Training Through Dynamic Block-Level Fallback
Pengle Zhang
Jia wei
Jintao Zhang
Jun-Jie Zhu
Jianfei Chen
MQ
74
3
0
13 Mar 2025
Numerical Error Analysis of Large Language Models
Stanislav Budzinskiy
Wenyi Fang
Longbin Zeng
Philipp Petersen
47
1
0
13 Mar 2025
Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge
Maximilian Abstreiter
Sasu Tarkoma
Roberto Morabito
44
0
0
12 Mar 2025
Towards Superior Quantization Accuracy: A Layer-sensitive Approach
Feng Zhang
Yanbin Liu
Weihua Li
Jie Lv
Xiaodan Wang
Q. Bai
MQ
50
0
0
09 Mar 2025
SwiLTra-Bench: The Swiss Legal Translation Benchmark
Joel Niklaus
Jakob Merane
Luka Nenadic
Sina Ahmadi
Yingqiang Gao
...
Matthew Guillod
Robin Mamié
Daniel Brunner
Julio Pereyra
Niko Grupen
AILaw
ELM
76
0
0
03 Mar 2025
Identifying Sensitive Weights via Post-quantization Integral
Yuezhou Hu
Weiyu Huang
Zichen Liang
C. L. P. Chen
Jintao Zhang
J. Zhu
Jianfei Chen
MQ
39
2
0
28 Feb 2025
Climate And Resource Awareness is Imperative to Achieving Sustainable AI (and Preventing a Global AI Arms Race)
Pedram Bakhtiarifard
Pınar Tözün
Christian Igel
Raghavendra Selvan
57
0
0
27 Feb 2025
Towards Conditioning Clinical Text Generation for User Control
Osman Alperen Koras
Rabi Bahnan
Jens Kleesiek
Amin Dada
39
0
0
24 Feb 2025
SQLong: Enhanced NL2SQL for Longer Contexts with LLMs
D. Q. Nguyen
Cong Duy Vu Hoang
Duy Vu
Gioacchino Tangari
Thanh Vu
Don Dharmasiri
Yuan-Fang Li
Long Duong
46
0
0
23 Feb 2025
3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation
Hansheng Chen
Bokui Shen
Yulin Liu
Ruoxi Shi
Linqi Zhou
Connor Z. Lin
Jiayuan Gu
H. Su
Gordon Wetzstein
Leonidas J. Guibas
94
1
0
21 Feb 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
38
3
0
19 Feb 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
ALM
MQ
90
0
0
18 Feb 2025
Divergent Thoughts toward One Goal: LLM-based Multi-Agent Collaboration System for Electronic Design Automation
Haoyuan Wu
Haisheng Zheng
Zhuolun He
Bei Yu
45
0
0
15 Feb 2025
SSH: Sparse Spectrum Adaptation via Discrete Hartley Transformation
Yixian Shen
Qi Bi
Jia-Hong Huang
Hongyi Zhu
Andy D. Pimentel
Anuj Pathania
46
0
0
08 Feb 2025
SubTrack your Grad: Gradient Subspace Tracking for Memory and Time Efficient Full-Parameter LLM Training
Sahar Rajabi
Nayeema Nonta
Sirisha Rambhatla
90
0
0
03 Feb 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Erik Cambria
LM&MA
AILaw
93
154
0
28 Jan 2025
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian
Wayne Xin Zhao
Ji-Rong Wen
MQ
41
0
0
22 Jan 2025
Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models
Tom Wallace
Naser Ezzati-Jivan
Beatrice Ombuki-Berman
MQ
38
1
0
16 Jan 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zhangyang Wang
Shiwei Liu
42
1
0
12 Jan 2025
Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning
Lang Xu
Quentin G. Anthony
Jacob Hatef
A. Shafi
Hari Subramoni
Dhabaleswar K.
Panda
32
0
0
08 Jan 2025
No More Adam: Learning Rate Scaling at Initialization is All You Need
Minghao Xu
Lichuan Xiang
Xu Cai
Hongkai Wen
80
2
0
16 Dec 2024
AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices
Yuzhan Wang
Sicong Liu
Bin Guo
Boqi Zhang
Ke Ma
Yasan Ding
Hao Luo
Yao Li
Zhiwen Yu
74
1
0
01 Dec 2024
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao
S. Sang
Tiancheng Zhi
Jing Liu
Qing Yan
Linjie Luo
Bo Yuan
Bo Yuan
VLM
86
1
0
26 Nov 2024
Lion Cub: Minimizing Communication Overhead in Distributed Lion
Satoki Ishikawa
Tal Ben-Nun
B. Van Essen
Rio Yokota
Nikoli Dryden
79
0
0
25 Nov 2024
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Yuzong Chen
Ahmed F. AbouElhamayed
Xilai Dai
Yang Wang
Marta Andronic
G. Constantinides
Mohamed S. Abdelfattah
MQ
103
1
0
18 Nov 2024
An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2
Pepijn de Reus
Ana Oprescu
Jelle Zuidema
MQ
85
1
0
15 Nov 2024
Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples
Noël Vouitsis
Rasa Hosseinzadeh
Brendan Leigh Ross
Valentin Villecroze
S. Gorti
Jesse C. Cresswell
G. Loaiza-Ganem
DiffM
48
0
0
13 Nov 2024
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
Philip Zmushko
Aleksandr Beznosikov
Martin Takáč
Samuel Horváth
39
0
0
12 Nov 2024
Efficient Adaptive Optimization via Subset-Norm and Subspace-Momentum: Fast, Memory-Reduced Training with Convergence Guarantees
T. Nguyen
Huy Le Nguyen
ODL
33
0
0
11 Nov 2024
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
C. Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
46
13
0
07 Nov 2024
100
K
o
r
100
D
a
y
s
:
T
r
a
d
e
−
o
f
f
s
w
h
e
n
P
r
e
−
T
r
a
i
n
i
n
g
w
i
t
h
A
c
a
d
e
m
i
c
R
e
s
o
u
r
c
e
s
100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
100
Kor
100
D
a
ys
:
T
r
a
d
e
−
o
ff
s
w
h
e
n
P
re
−
T
r
ainin
g
w
i
t
h
A
c
a
d
e
mi
c
R
eso
u
rces
Apoorv Khandelwal
Tian Yun
Nihal V. Nayak
Jack Merullo
Stephen H. Bach
Chen Sun
Ellie Pavlick
VLM
AI4CE
OnRL
58
2
0
30 Oct 2024
You Never Know: Quantization Induces Inconsistent Biases in Vision-Language Foundation Models
Eric Slyman
Anirudh Kanneganti
Sanghyun Hong
Stefan Lee
VLM
MQ
39
0
0
26 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Y. Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
75
9
0
25 Oct 2024
Cross-lingual Transfer of Reward Models in Multilingual Alignment
Jiwoo Hong
Noah Lee
Rodrigo Martínez-Castaño
César Rodríguez
James Thorne
48
4
0
23 Oct 2024
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
Thomas Robert
M. Safaryan
Ionut-Vlad Modoranu
Dan Alistarh
ODL
33
2
0
21 Oct 2024
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces
Ziming Yu
Pan Zhou
Sike Wang
Jia Li
Hua Huang
31
1
0
11 Oct 2024
PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency
Preferred Elements
:
Kenshin Abe
Kaizaburo Chubachi
Yasuhiro Fujita
...
Yoshihiko Ozaki
Shotaro Sano
Shuji Suzuki
Tianqi Xu
Toshihiko Yanase
36
0
0
10 Oct 2024
QuAILoRA: Quantization-Aware Initialization for LoRA
Neal Lawton
Aishwarya Padmakumar
Judith Gaspers
Jack FitzGerald
Anoop Kumar
Greg Ver Steeg
Aram Galstyan
MQ
31
0
0
09 Oct 2024
1
2
3
4
5
Next