ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.02390
  4. Cited By
Quantized Distributed Training of Large Models with Convergence
  Guarantees

Quantized Distributed Training of Large Models with Convergence Guarantees

5 February 2023
I. Markov
Adrian Vladu
Qi Guo
Dan Alistarh
    MQ
ArXivPDFHTML

Papers citing "Quantized Distributed Training of Large Models with Convergence Guarantees"

10 / 10 papers shown
Title
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data
  Parallelism for LLM Training
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Jinda Jia
Cong Xie
Hanlin Lu
Daoce Wang
Hao Feng
...
Baixi Sun
Yanghua Peng
Zhi-Li Zhang
Xin Liu
Dingwen Tao
MQ
30
4
0
20 Oct 2024
Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale
  Models with Structured Pruning in Resource-Limited Clients
Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients
Yan Li
Mingyi Li
Xiao Zhang
Guangwei Xu
Feng Chen
Yuan Yuan
Yifei Zou
Mengying Zhao
Jianbo Lu
Dongxiao Yu
32
0
0
11 Oct 2024
Mask-Encoded Sparsification: Mitigating Biased Gradients in
  Communication-Efficient Split Learning
Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning
Wenxuan Zhou
Zhihao Qu
Shen-Huan Lyu
Miao Cai
Baoliu Ye
40
0
0
25 Aug 2024
Exploring Quantization for Efficient Pre-Training of Transformer
  Language Models
Exploring Quantization for Efficient Pre-Training of Transformer Language Models
Kamran Chitsaz
Quentin Fournier
Gonccalo Mordido
Sarath Chandar
MQ
49
3
0
16 Jul 2024
QSync: Quantization-Minimized Synchronous Distributed Training Across
  Hybrid Devices
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
Juntao Zhao
Borui Wan
Size Zheng
Haibin Lin
Yibo Zhu
Chuan Wu
29
3
0
02 Jul 2024
A Comparative Analysis of Distributed Training Strategies for GPT-2
A Comparative Analysis of Distributed Training Strategies for GPT-2
Ishan Patwardhan
Shubham Gandhi
Om M. Khare
Amit Joshi
Suraj Sawant
37
1
0
24 May 2024
Knowledge Distillation Performs Partial Variance Reduction
Knowledge Distillation Performs Partial Variance Reduction
M. Safaryan
Alexandra Peste
Dan Alistarh
30
6
0
27 May 2023
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
177
416
0
18 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
Linear Convergence of Gradient and Proximal-Gradient Methods Under the
  Polyak-Łojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark W. Schmidt
139
1,201
0
16 Aug 2016
1