Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.02390
Cited By
Quantized Distributed Training of Large Models with Convergence Guarantees
5 February 2023
I. Markov
Adrian Vladu
Qi Guo
Dan Alistarh
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quantized Distributed Training of Large Models with Convergence Guarantees"
10 / 10 papers shown
Title
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Jinda Jia
Cong Xie
Hanlin Lu
Daoce Wang
Hao Feng
...
Baixi Sun
Yanghua Peng
Zhi-Li Zhang
Xin Liu
Dingwen Tao
MQ
30
4
0
20 Oct 2024
Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients
Yan Li
Mingyi Li
Xiao Zhang
Guangwei Xu
Feng Chen
Yuan Yuan
Yifei Zou
Mengying Zhao
Jianbo Lu
Dongxiao Yu
32
0
0
11 Oct 2024
Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning
Wenxuan Zhou
Zhihao Qu
Shen-Huan Lyu
Miao Cai
Baoliu Ye
40
0
0
25 Aug 2024
Exploring Quantization for Efficient Pre-Training of Transformer Language Models
Kamran Chitsaz
Quentin Fournier
Gonccalo Mordido
Sarath Chandar
MQ
49
3
0
16 Jul 2024
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
Juntao Zhao
Borui Wan
Size Zheng
Haibin Lin
Yibo Zhu
Chuan Wu
29
3
0
02 Jul 2024
A Comparative Analysis of Distributed Training Strategies for GPT-2
Ishan Patwardhan
Shubham Gandhi
Om M. Khare
Amit Joshi
Suraj Sawant
37
1
0
24 May 2024
Knowledge Distillation Performs Partial Variance Reduction
M. Safaryan
Alexandra Peste
Dan Alistarh
30
6
0
27 May 2023
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
177
416
0
18 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark W. Schmidt
139
1,201
0
16 Aug 2016
1