Quantized Distributed Training of Large Models with Convergence
Guarantees

Quantized Distributed Training of Large Models with Convergence Guarantees

5 February 2023

Dan Alistarh

Papers citing "Quantized Distributed Training of Large Models with Convergence Guarantees"

10 / 10 papers shown

Title
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training Jinda Jia Cong Xie Hanlin Lu Daoce Wang Hao Feng ... Baixi Sun Yanghua Peng Zhi-Li Zhang Xin Liu Dingwen Tao MQ 30 4 0 20 Oct 2024
Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients Yan Li Mingyi Li Xiao Zhang Guangwei Xu Feng Chen Yuan Yuan Yifei Zou Mengying Zhao Jianbo Lu Dongxiao Yu 32 0 0 11 Oct 2024
Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning Wenxuan Zhou Zhihao Qu Shen-Huan Lyu Miao Cai Baoliu Ye 40 0 0 25 Aug 2024
Exploring Quantization for Efficient Pre-Training of Transformer Language Models Kamran Chitsaz Quentin Fournier Gonccalo Mordido Sarath Chandar MQ 49 3 0 16 Jul 2024
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices Juntao Zhao Borui Wan Size Zheng Haibin Lin Yibo Zhu Chuan Wu 29 3 0 02 Jul 2024
A Comparative Analysis of Distributed Training Strategies for GPT-2 Ishan Patwardhan Shubham Gandhi Om M. Khare Amit Joshi Suraj Sawant 37 1 0 24 May 2024
Knowledge Distillation Performs Partial Variance Reduction M. Safaryan Alexandra Peste Dan Alistarh 30 6 0 27 May 2023
ZeRO-Offload: Democratizing Billion-Scale Model Training Jie Ren Samyam Rajbhandari Reza Yazdani Aminabadi Olatunji Ruwase Shuangyang Yang Minjia Zhang Dong Li Yuxiong He MoE 177 416 0 18 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,826 0 17 Sep 2019
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition Hamed Karimi J. Nutini Mark W. Schmidt 139 1,201 0 16 Aug 2016